You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NanoCode012
changed the title
[Bug] Resuming training on a pretraining loop does not continue from where it left off
[Bug] Resuming training on a pretraining loop does not continue data loading from where it left off
Jan 3, 2025
Please check that this issue hasn't been reported before.
Expected Behavior
Loads the dataset, goes to the last processed sample from prior training, and continues training.
Current behaviour
Loads the dataset as usual, does not resume prior point in training in data loader, and continues training
Steps to reproduce
Start training, stop, resume.
Config yaml
No response
Possible solution
We need to save the dataloader's point and resume that when we resume training.
https://huggingface.co/docs/datasets/v3.2.0/stream#save-a-dataset-checkpoint-and-resume-iteration
Which Operating Systems are you using?
Python Version
3.11
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: