Release v0.4.0 · axolotl-ai-cloud/axolotl

New Features (highlights)

Streaming multipack for continued pre-training
Mistral & Mixtral support
Simplified Multipack for Mistral, Falcon, Qwen2, and Phi
DPO/IPO/KTO-pairs RL-training support via trl
Improve BatchSampler for multipack support, allows for resume from checkpointing, shuffling data each epoch
bf16: auto support
add MLFlow support
save YAML configs to WandB
save predictions during evals to WandB
more tests! more smoke tests for smol model training
NEFTune support

What's Changed

document that packaging needs to be installed before flash-attn by @winglian in #559
Fix pretraining with iterable/streaming Dataset by @jphme in #556
Add training callback to send predictions to WandB table by @Glavin001 in #521
fix wandb so mypy doesn't complain by @winglian in #562
check for the existence of the default accelerate config that can create headaches by @winglian in #561
add optimization for group-by-len by @winglian in #563
gracefully handle length feature used for group by by @winglian in #565
improve how we setup eval/save strategies and steps by @winglian in #547
let hf trainer handle torch compile by @winglian in #516
Model parallel by @winglian in #538
fix save_steps so it doesn't get duplicated by @winglian in #567
set auto for other params that hf trainer sets for ds. include zero1 json by @winglian in #570
remove columns after tokenizing for pretraining by @winglian in #571
mypy wandb ignore by @winglian in #572
Phi examples by @winglian in #569
e2e testing by @winglian in #574
E2e device cuda by @winglian in #575
E2e passing tests by @winglian in #576
refactor scripts/finetune.py into new cli modules by @winglian in #550
update support matrix with btlm and phi by @winglian in #579
prevent cli functions from getting fired on import by @winglian in #581
Fix Codellama examples by @Kimiko-AI in #582
support custom field for completion from yml by @winglian in #580
Feat(doc): Add features to doc by @NanoCode012 in #583
Support Sample packing for phi arch by @winglian in #586
don't resize embeddings if it's already large enough by @winglian in #577
Enable full (non-sharded) model saving with SHARDED_STATE_DICT by @jphme in #584
make phi training work with Loras by @winglian in #588
optionally configure sample packing for evals by @winglian in #589
don't add position_ids for evals when not using eval sample packing by @winglian in #591
gather/broadcast the max value of the packing efficiency automatically by @winglian in #463
Feat(data): Allow loading local csv and text by @NanoCode012 in #594
add bf16 check by @winglian in #587
btlm and falcon monkey patches for flash attn by @winglian in #566
minor tweaks to simplify by @winglian in #597
Fix for check with cfg and merge_lora by @winglian in #600
improve handling for empty text on the tokenization step by @winglian in #502
more sane defaults for openllama 3b used for quickstarts by @winglian in #602
update dockerfile to not build evoformer since it fails the build by @winglian in #607
Delete duplicate lines in models.py by @bofenghuang in #606
support to disable exllama for gptq by @winglian in #604
Update requirements.txt - Duplicated package by @Psancs05 in #610
Only run tests when a change to python files is made by @maximegmd in #614
Create multi-node.md by @maximegmd in #613
fix distributed devices by @maximegmd in #612
ignore wandb to resolve isort headaches by @winglian in #619
skip the gpu memory checks if the device is set to 'auto' by @winglian in #609
let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners by @winglian in #427
run eval on the first step to get a baseline by @winglian in #617
split completion text to sequence_len by @winglian in #616
misc fixes to add gptq tests by @winglian in #621
chore(callback): Remove old peft saving code by @NanoCode012 in #510
update README w deepspeed info by @winglian in #605
create a model card with axolotl badge by @winglian in #624
better handling and logging of empty sharegpt turns by @winglian in #603
tweak: improve base builder for smaller layers by @maximegmd in #500
Feat(doc): Add eval_sample_packing to doc by @NanoCode012 in #625
Fix: Fail bf16 check when running on cpu during merge by @NanoCode012 in #631
default model changed by @mhenrichsen in #629
Added quotes to the pip install -e command in the documentation to fix an incompatibility … by @Nan-Do in #632
Feat: Add support for upstream FA2 by @NanoCode012 in #626
eval_table isn't quite stable enough to be in default llama configs by @winglian in #637
attention_mask not needed for training by @winglian in #642
update for recent transformers updates by @winglian in #636
use fastchat conversations template by @winglian in #578
skip some flash attn patches unless explicitly enabled by @winglian in #643
Correct typos in datasets.py by @felixonmars in #639
Fix bug in dataset loading by @ethanhs in #284
Warn users to login to HuggingFace by @Napuh in #645
Mistral flash attn packing by @winglian in #646
Fix(cfg): Add validation for save_strategy and eval_strategy by @NanoCode012 in #633
Feat: Add example for Mistral by @NanoCode012 in #644
Add mistral/README.md by @adarshxs in #647
fix for flash attn w mistral w/o sammple packing by @winglian in #648
don't strip the prompt for check since we don't strip to tokenize anymore by @winglian in #650
add support for defined train split by @winglian in #654
Fix bug when using pretokenized datasets by @ein-ich in #652
Make dataset_processes configurable by @corbt in #651
add mistral e2e tests by @winglian in #649
removed duplicate on requirements.txt by @Napuh in #661
make sure we also run CI tests when requirements.txt changes by @winglian in #663
prepared dataset caching, other misc fixes by @winglian in #665
remove patch fix for phi by @winglian in #664
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched by @winglian in #662
Feat: Add config yaml to section for reprod in bug-report.yaml by @NanoCode012 in #667
Feat: Allow usage of native Mistral FA when no sample_packing by @NanoCode012 in #669
chore: Clean up repetitive model kwargs by @NanoCode012 in #670
Fix(version): Update FA to work with Mistral SWA by @NanoCode012 in #673
Fix(tokenizer): Set rstrip,lstrip,norm to False by @NanoCode012 in #678
Fix: Future deprecation warning with use_auth_token by @NanoCode012 in #680
Feat: Set WORKDIR to /workspace/axolotl by @NanoCode012 in #679
Fix: ValueError when FA + Mistral when padding_side=right by @NanoCode012 in #681
flash_attention + sample packing for stablelm 3b by @winglian in #671
Adding qlora config for Mistral by @TokenBender in #675
Fix: Higher vram usage for mistral and sample_packing by @NanoCode012 in #691
fix multiline for docker by @winglian in #694
update mistral lr, sample pack by @mhenrichsen in #693
apex not needed as amp is part of pytorch by @winglian in #696
add docker images for pytorch 2.10 by @winglian in #697
fix unneeded space by @mhenrichsen in #699
Update README with some explanations by @seungduk-yanolja in #700
Get qlora mistral-7b fine tuning working on a single 4090 by @lukemarsden in #708
fix(doc): Add note on inference w sample packing by @NanoCode012 in #712
Fix: lowercase True values in config by @atgctg in #713
fix(doc): update default doc according to arg by @NanoCode012 in #714
Save Axolotl config as WandB artifact by @jphme in #716
improve handling of the prepared ds path and other cfg defaults by @winglian in #701
fix pytorch 2.1.0 build, add multipack docs by @winglian in #722
add noisy embedding by @maximegmd in #721
pin xformers >= 0.0.22 by @winglian in #724
misc sharegpt fixes by @winglian in #723
workaround for installing xformers w torch 2.1.0 by @winglian in #725
tweak for xformers install w pytorch 2.1.0 by @winglian in #727
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention by @winglian in #728
Clarify custom format example by @casper-hansen in #729
Mistral: Sliding Window Attention with Flash Attention and Sample Packing by @casper-hansen in #732
badge by @mhenrichsen in #739
catch ConnectionError when checking dataset from HuggingFace by @Napuh in #743
Fix(model): Linear detected and added to target module with rope linear by @NanoCode012 in #738
improve: Enhance code readability of prompt_tokenizers.py by @seungduk-yanolja in #707
add a latest tag for regular axolotl image, cleanup extraneous print statement by @winglian in #746
Fix DeepSpeed Zero 3 Saving by @tokestermw in #709
chore: bump transformers to v4.34.1 to fix tokenizer issue by @NanoCode012 in #745
add to docs by @winglian in #703
Implement fused modules by @casper-hansen in #747
remove lora fused packing test by @winglian in #758
Fix: eval table conflict with eval_sample_packing by @NanoCode012 in #769
Fix: Cannot tokenize with bf16 and on cpu by @NanoCode012 in #766
Hotfix for fused QKV not saving the trained weights of o_proj by @casper-hansen in #762
convert exponential notation lr to floats by @winglian in #771
Fix: Warn when fullfinetune without adapter by @NanoCode012 in #770
simplify by removing duplicate base_model_config by @winglian in #772
disable eval table w sample packing in examples by @winglian in #778
refactor setup trainer so we can add more hooks by @winglian in #773
chore: refactor truthy check and fix mypy by @NanoCode012 in #780
chore(readme): Improve documentation on conversation field by @NanoCode012 in #782
Threaded MultipackDistributedDataloader with prefetched samples by @casper-hansen in #759
Create preprocess CLI by @casper-hansen in #785
Add docker advanced instruction to README by @gordicaleksa in #792
Fix Deepspeed Zero3 Config by @teknium1 in #791
Update to adapt to sharegpt datasets with "assistant" rather than "gp… by @MilesQLi in #774
fix eval_steps to be a sane default by @winglian in #797
refactor neft patch to be more re-usable similar to trl's impl by @winglian in #796
fix(config): Set eos/bos to tokenizer if different by @NanoCode012 in #801
feat(doc): add dummyoptim faq fix by @NanoCode012 in #802
fix(tokenizer): update log order after update by @NanoCode012 in #806
fix model parallel by @winglian in #816
fix: pin autogptq by @NanoCode012 in #818
update table for rwkv4 support, fix process count for dataset by @winglian in #822
Feat: Added Gradio support by @Stillerman in #812
Dockerfile: add deepspeed-kernels dependency for deepspeed>=0.12.0 by @fpreiss in #827
cleanup verbosity a bit by @winglian in #799
make sure to cleanup tmp output_dir for e2e tests by @winglian in #831
multipack w batch sampler by @winglian in #795
don't compile deepspeed or bitsandbytes from source by @winglian in #837
Pin optimum package by @brthor in #838
cleanup the old multipack dataloader by @winglian in #841
include the suffix modified string in ascii art by @fpreiss in #852
feat(doc): add more info on train_on_split by @NanoCode012 in #855
chore(doc): Separate section on runpod by @NanoCode012 in #860
various bugfixes by @winglian in #856
adds llama and mistral dropout support by @winglian in #858
multipack len should use max, not min by @winglian in #863
Docs: add instructions to 1-click launching on public clouds by @concretevitamin in #862
Update data.py for signature generation by @MilesQLi in #851
lint fix that didn't get caught by linter by @winglian in #866
make docker command more robust by @winglian in #861
add e2e tests for checking functionality of resume from checkpoint by @winglian in #865
allow overriding of model_config parameters from the YML by @winglian in #853
Feat: Add dataset loading from S3, GCS by @NanoCode012 in #765
try #2: pin hf transformers and accelerate to latest release, don't reinstall pytorch by @winglian in #867
don't train if eval split is too small by @winglian in #873
Phi update 202311 by @winglian in #876
Install from git url by @msaroufim in #874
fix: revert local dir dataset load by @NanoCode012 in #878
chore(doc): Add info on changing role in sharegpt by @NanoCode012 in #886
Feat: Add warmup_ratio by @NanoCode012 in #893
fix: warning should not show if eval_batch_size not provided by @NanoCode012 in #896
Feat: Add Qwen by @NanoCode012 in #894
update datasets version to cut down the warnings due to pyarrow arg change by @winglian in #897
fix: remove FA for qwen examples by @NanoCode012 in #900
Determine FSDP/deepspeed settings on device select. by @kallewoof in #883
ensure merged model matches the training dtype by @winglian in #902
fix for qwen w lora by @winglian in #906
Remove lr scheduler in DeepSpeed config to avoid conflict by @Haoxiang-Wang in #909
feature: loss watchdog for terminating training runs that are failing by @kallewoof in #899
Feat(wandb): Refactor to be more flexible by @NanoCode012 in #767
Support device_map=sequential & max_memory config parameters by @brthor in #903
feat: add check for quantized model by @NanoCode012 in #913
Pin flash-attn to 2.3.3 by @casper-hansen in #919
fix(tokenizer): handle fast tokenizer properly for bos/eos by @NanoCode012 in #914
support for mamba by @winglian in #915
fixing prompt template of chatml by removal of linebreak by @timothylimyl in #922
Mixtral multipack by @winglian in #928
update to latest transformers for mixstral support by @winglian in #929
Mixtral: More correct MoE, lower loss by @casper-hansen in #932
Update requirements.txt (fschat==0.2.34) by @tokestermw in #940
Mixtral official by @winglian in #942
Respect sequence_len in config for type: llama2_chat by @hamelsmu in #926
new evals_per_epoch and saves_per_epoch to make things cleaner by @winglian in #944
More hints on what to do with CUDA Out of memory errors by @jooray in #925
fix: remove excessive newlines in system prompt(s) for alpaca by @kallewoof in #936
Flash attn hotfix by @winglian in #951
Fix Deepspeed loading by @winglian in #950
fix: switch to using the HuggingFace Transformers NEFT implementation by @kallewoof in #941
Add docs by @hamelsmu in #947
Fix prompt assembly for llama by @hamelsmu in #952
update transformers to fix checkpoint saving by @dumpmemory in #963
update to latest nccl in docker image by @winglian in #965
fix for build for nccl in dockerfile by @winglian in #970
fix: add lr scheduler kwargs to Trainer by @NanoCode012 in #972
Update README.md by @eltociear in #966
Dockerfile torch fix by @winglian in #987
fix mistral prompt assembly by @hamelsmu in #982
Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens by @NanoCode012 in #787
Add tests to Docker by @hamelsmu in #993
change val size by @mhenrichsen in #992
chore: Update transformers to latest by @NanoCode012 in #986
support for cuda 12.1 by @winglian in #989
set output_router_logits for mixtral config: by @winglian in #995
Add an example config for finetuning a 34B model on a 24GB GPU by @evangriffiths in #1000
FEAT: add tagging support to axolotl by @younesbelkada in #1004
Set eval_sample_packing to false in mistral config.yaml by @kmsydney in #1003
add config to model card by @hamelsmu in #1005
remove landmark attn and xpos rope implementations by @winglian in #1010
[Docs] Nit: clarify what inference is by @hamelsmu in #1012
[Docs] Nit: Remind people to auth to wandb if they are going to use it by @hamelsmu in #1013
feat: remove need to add load_in* during merge by @NanoCode012 in #1017
feat: expose bnb kwargs by @NanoCode012 in #1018
add ultrachat prompt strategies by @winglian in #996
[WandB] Push axolotl config to top level wandb files by @hamelsmu in #1014
Adds chat templates by @mhenrichsen in #1022
Fix: bf16 support for inference by @taziksh in #981
use recommended setting for use_reentrant w gradient checkpointing by @winglian in #1021
added tiny llama examples for lora and qlora by @tdolan21 in #1027
chore(readme): update instruction to set config to load from cache by @NanoCode012 in #1030
[Docs] delete unused cfg value lora_out_dir by @hamelsmu in #1029
fix: lint by @NanoCode012 in #1037
chore(config): clean up old log for Qwen by @NanoCode012 in #1034
bump transformers and update attention class map name by @winglian in #1023
Added chatglm3 conversation type for training models like TinyLLama by @xaviviro in #1036
fix HF model card upload for PEFT models by @hamelsmu in #1043
Clean Up LorA Merge by @hamelsmu in #1044
feature: better device mapping for large models by @kallewoof in #918
feat: always push checkpoint to hub if set by @NanoCode012 in #1049
Update tests-docker.yml by @hamelsmu in #1052
streaming multipack for pretraining dataset by @jinwonkim93 in #959
Simplify Docker Unit Test CI by @hamelsmu in #1055
Phi2 rewrite by @winglian in #1058
Efficiently get the length of the tokenized docs by @RicardoDominguez in #1063
Sponsors by @winglian in #1065
Update FUNDING.yml for Kofi link by @winglian in #1067
fix: torch_dtype mistral default to fp32 by @NanoCode012 in #1050
Cosine learning rate schedule - minimum learning rate by @RicardoDominguez in #1062
fix double eos token for chatml by @winglian in #1054
Add: mlflow for experiment tracking by @JohanWork in #1059
update peft to 0.7.0 by @mtenenholtz in #1073
paired kto support by @winglian in #1069
Separate AutoGPTQ dep to pip install -e .[auto-gptq] by @casper-hansen in #1077
attempt to also run e2e tests that needs gpus by @winglian in #1070
Update FUNDING.yml with bitcoin by @winglian in #1079
swap the data collator for evals if not using sample packing by @winglian in #1076
be more robust about checking embedding modules for lora finetunes by @winglian in #1074
fix: train_on_inputs: true ignored for sharegpt by @NanoCode012 in #1045
update sharegpt conversations when chatml chat template is set by @winglian in #1075
additional logging to get maximum token length of a sequence in the dataset by @winglian in #1066
pin accelerate for deepspeed fix by @winglian in #1080
fix: warn user to install mamba_ssm package by @NanoCode012 in #1019
use tags again for test image, only run docker e2e after pre-commit checks by @winglian in #1081
optimize calculation of cu_seqlens from position_ids by @winglian in #1084
add python 3.11 to the matrix for unit tests by @winglian in #1085
Remove fused-dense-lib from requirements.txt by @casper-hansen in #1087
misc fixes from #943 by @winglian in #1086
add gptneox embeddings, fix phi2 inputs, also fix the casting by @winglian in #1083
Add Debugging Guide by @hamelsmu in #1089
Fix debugging.md by @hamelsmu in #1091
feat: enable trl's autounwrap by @NanoCode012 in #1060
Fix broken pypi.yml by @msaroufim in #1099
Update README.md by @hamelsmu in #1103
Add section for debugging with Docker by @hamelsmu in #1104
Add link on README to Docker Debugging by @hamelsmu in #1107
keep gate in fp32 for loras by @winglian in #1105
Fix debugging video by @hamelsmu in #1111
Disable caching on --disable_caching in CLI by @casper-hansen in #1110
Reverse caching PR by @casper-hansen in #1115
Enable or disable bf16 support based on availability by @simhallq in #1116
update PR template so we can capture twitter or discord handles by @winglian in #1121
pin model_revision for phi2 by @winglian in #1123
fix(readme): clarify custom user prompt [no-ci] by @NanoCode012 in #1124
Add layers_to_transform for lora_config by @xzuyn in #1118
Agnostic cloud gpu docker image and Jupyter lab by @winglian in #1097
Preprocess dataset size fix by @winglian in #1131
fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli by @NanoCode012 in #1136
fix bf16 check when preprocessing data by @winglian in #1140
Add shifted sparse attention by @joecummings in #973
Multipack simplify for Mixtral by @winglian in #1142
Fix link for Minotaur model by @joecummings in #1146
Dockerfile cloud ports by @winglian in #1148
fix check for env var by @winglian in #1151
feat(dataset): add config to keep processed dataset in memory by @NanoCode012 in #1152
Deprecate max packed sequence len by @winglian in #1141
make sure the model config loader respects the model_revision too by @winglian in #1160
Qwen2 by @winglian in #1166
jupyter lab fixes by @winglian in #1139
set fp16 to false if bf16, update bf16: auto in example YAMLs by @winglian in #1122
Add mlflow callback for pushing config to mlflow artifacts by @JohanWork in #1125
improve vram use w gradient checkpointing by @winglian in #1167
Vram fix attempt by @winglian in #1164
add commit message option to skip docker image builds in ci by @winglian in #1168
Falcon embeddings by @winglian in #1149
support for explicit test_dataset definition for evals by @winglian in #786
Add desc to map/filter by @casper-hansen in #1162
Feat(test): Add tests for alpaca chatml prompt tokenizer by @JohanWork in #1088
DPO cleanup by @winglian in #1126
Update README.md by @singhay in #1169
Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) by @Tilemachoc in #1155
don't fail if can't cast weights due to offload when merging by @winglian in #1172
update docs by @winglian in #1176
Phi2 multipack by @winglian in #1173
DPO fixes v2 by @winglian in #1174
Docs: RLHF Update after cleanup by @AlekseyKorshuk in #1178
Add support for offline mode with HF_HUB_OFFLINE envvar by @JamesHWade in #1182
Fix do_merge_lora raises an Exception in transformers v4.37.0 by @tisorlawan in #1184
report min lenght of tokenized data by @winglian in #1186
more dpo fixes for dataset loading and docs by @winglian in #1185
upgrade deepspeed to 0.13.1 for mixtral fixes by @winglian in #1189
Standardize system prompt format for AlpacaPrompter (instruct case) by @sadaisystems in #1190
Mixtral fixes 20240124 by @winglian in #1192
prepare for release v0.4.0 by @winglian in #1175

New Contributors

@Kimiko-AI made their first contribution in #582
@bofenghuang made their first contribution in #606
@Psancs05 made their first contribution in #610
@Nan-Do made their first contribution in #632
@felixonmars made their first contribution in #639
@Napuh made their first contribution in #645
@adarshxs made their first contribution in #647
@ein-ich made their first contribution in #652
@corbt made their first contribution in #651
@TokenBender made their first contribution in #675
@seungduk-yanolja made their first contribution in #700
@lukemarsden made their first contribution in #708
@atgctg made their first contribution in #713
@casper-hansen made their first contribution in #729
@tokestermw made their first contribution in #709
@gordicaleksa made their first contribution in #792
@MilesQLi made their first contribution in #774
@Stillerman made their first contribution in #812
@fpreiss made their first contribution in #827
@brthor made their first contribution in #838
@concretevitamin made their first contribution in #862
@msaroufim made their first contribution in #874
@kallewoof made their first contribution in #883
@Haoxiang-Wang made their first contribution in #909
@timothylimyl made their first contribution in #922
@hamelsmu made their first contribution in #926
@jooray made their first contribution in #925
@dumpmemory made their first contribution in #963
@eltociear made their first contribution in #966
@evangriffiths made their first contribution in #1000
@younesbelkada made their first contribution in #1004
@kmsydney made their first contribution in #1003
@taziksh made their first contribution in #981
@tdolan21 made their first contribution in #1027
@xaviviro made their first contribution in #1036
@jinwonkim93 made their first contribution in #959
@RicardoDominguez made their first contribution in #1063
@JohanWork made their first contribution in #1059
@mtenenholtz made their first contribution in #1073
@simhallq made their first contribution in #1116
@xzuyn made their first contribution in #1118
@joecummings made their first contribution in #973
@singhay made their first contribution in #1169
@Tilemachoc made their first contribution in #1155
@AlekseyKorshuk made their first contribution in #1178
@JamesHWade made their first contribution in #1182
@tisorlawan made their first contribution in #1184
@sadaisystems made their first contribution in #1190

Full Changelog: v0.3.0...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

New Features (highlights)

What's Changed

New Contributors

Contributors