Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tune with hugging face trainer #97

Open
SangRyul opened this issue Apr 25, 2023 · 0 comments
Open

Fine-tune with hugging face trainer #97

SangRyul opened this issue Apr 25, 2023 · 0 comments

Comments

@SangRyul
Copy link

SangRyul commented Apr 25, 2023

Hello.

First, Thank you for your great work on the task. I could get many insights from this project.
I'm Just wondering

  1. is genre-kilt model in huggingface differ from model in this repository? if so how they are different?

  2. I have my custom document retrieval dataset in kilt style. How can I finetune with hugging face model? I just want to transport with trainer api in huggingface. Can you give me a guide?

  3. I also tried finetuning with this script
    mine is at below

#!/bin/bash

# Copyright (c) Facebook, Inc. and its affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

#DATASET=$1
#NAME=$2
DATASET=/userhomes/sangryul/project/contrastive-retrieval/GENRE/data_fair
BASED_MODEL=/userhomes/sangryul/project/contrastive-retrieval/GENRE/models/fairseq_wikipage_retrieval
NAME=nq_100_finetune
STEP=10000

fairseq-train $DATASET/bin/ \
    --wandb-project multiperspective \
    --no-epoch-checkpoints \
    --keep-best-checkpoints 1 \
    --save-dir /userhomes/sangryul/project/contrastive-retrieval/GENRE/models/$NAME \
    --restore-file $BASED_MODEL/model.pt \
    --arch bart_large  \
    --task translation  \
    --criterion label_smoothed_cross_entropy  \
    --source-lang source  \
    --target-lang target  \
    --truncate-source  \
    --label-smoothing 0.1  \
    --max-tokens 1024  \
    --update-freq 1  \
    --max-update $STEP  \
    --required-batch-size-multiple 1  \
    --dropout 0.1  \
    --attention-dropout 0.1  \
    --relu-dropout 0.0  \
    --weight-decay 0.01  \
    --optimizer adam  \
    --adam-betas "(0.9, 0.999)"  \
    --adam-eps 1e-08  \
    --clip-norm 0.1  \
    --lr-scheduler polynomial_decay  \
    --lr 3e-05  \
    --total-num-update $STEP  \
    --warmup-updates 500  \
    --num-workers 20  \
    --share-all-embeddings \
    --layernorm-embedding \
    --share-decoder-input-output-embed  \
    --skip-invalid-size-inputs-valid-test  \
    --log-format json  \
    --log-interval 10  \
    --patience 200  \

But I found that the training loss is decreasing while evaluating loss is increasing.
I used Natural question kilt train and dev dataset. and Is this because of overfitting?

Thank you for your effort on this project again.

Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant