Argument | Type | Default value | Description |
---|---|---|---|
dataset_path | str |
"" |
Path or url of the dataset. If empty download from S3. |
dataset_cache | str |
'./dataset_cache.bin' |
Path or url of the dataset cache |
model | str |
"openai-gpt" |
Path, url or short name of the model |
num_candidates | int |
2 |
Number of candidates for training |
max_history | int |
2 |
Number of previous exchanges to keep in history |
train_batch_size | int |
4 |
Batch size for training |
valid_batch_size | int |
4 |
Batch size for validation |
gradient_accumulation_steps | int |
8 |
Accumulate gradients on several steps |
lr | float |
6.25e-5 |
Learning rate |
lm_coef | float |
1.0 |
LM loss coefficient |
mc_coef | float |
1.0 |
Multiple-choice loss coefficient |
max_norm | float |
1.0 |
Clipping gradient norm |
n_epochs | int |
3 |
Number of training epochs |
personality_permutations | int |
1 |
Number of permutations of personality sentences |
device | str |
"cuda" if torch.cuda.is_available() else "cpu" |
Device (cuda or cpu) |
fp16 | str |
"" |
Set to O0, O1, O2 or O3 for fp16 training (see apex documentation) |
local_rank | int |
-1 |
Local rank for distributed training (-1: not distributed) |