Fine-tune with hugging face trainer #97

SangRyul · 2023-04-25T07:18:15Z

Hello.

First, Thank you for your great work on the task. I could get many insights from this project.
I'm Just wondering

is genre-kilt model in huggingface differ from model in this repository? if so how they are different?
I have my custom document retrieval dataset in kilt style. How can I finetune with hugging face model? I just want to transport with trainer api in huggingface. Can you give me a guide?
I also tried finetuning with this script
mine is at below

#!/bin/bash

# Copyright (c) Facebook, Inc. and its affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

#DATASET=$1
#NAME=$2
DATASET=/userhomes/sangryul/project/contrastive-retrieval/GENRE/data_fair
BASED_MODEL=/userhomes/sangryul/project/contrastive-retrieval/GENRE/models/fairseq_wikipage_retrieval
NAME=nq_100_finetune
STEP=10000

fairseq-train $DATASET/bin/ \
    --wandb-project multiperspective \
    --no-epoch-checkpoints \
    --keep-best-checkpoints 1 \
    --save-dir /userhomes/sangryul/project/contrastive-retrieval/GENRE/models/$NAME \
    --restore-file $BASED_MODEL/model.pt \
    --arch bart_large  \
    --task translation  \
    --criterion label_smoothed_cross_entropy  \
    --source-lang source  \
    --target-lang target  \
    --truncate-source  \
    --label-smoothing 0.1  \
    --max-tokens 1024  \
    --update-freq 1  \
    --max-update $STEP  \
    --required-batch-size-multiple 1  \
    --dropout 0.1  \
    --attention-dropout 0.1  \
    --relu-dropout 0.0  \
    --weight-decay 0.01  \
    --optimizer adam  \
    --adam-betas "(0.9, 0.999)"  \
    --adam-eps 1e-08  \
    --clip-norm 0.1  \
    --lr-scheduler polynomial_decay  \
    --lr 3e-05  \
    --total-num-update $STEP  \
    --warmup-updates 500  \
    --num-workers 20  \
    --share-all-embeddings \
    --layernorm-embedding \
    --share-decoder-input-output-embed  \
    --skip-invalid-size-inputs-valid-test  \
    --log-format json  \
    --log-interval 10  \
    --patience 200  \

But I found that the training loss is decreasing while evaluating loss is increasing.
I used Natural question kilt train and dev dataset. and Is this because of overfitting?

Thank you for your effort on this project again.

Thank you very much

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune with hugging face trainer #97

Fine-tune with hugging face trainer #97

SangRyul commented Apr 25, 2023 •

edited

Loading

Fine-tune with hugging face trainer #97

Fine-tune with hugging face trainer #97

Comments

SangRyul commented Apr 25, 2023 • edited Loading

SangRyul commented Apr 25, 2023 •

edited

Loading