Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Resuming training on a pretraining loop does not continue data loading from where it left off #2229

Open
6 of 8 tasks
NanoCode012 opened this issue Jan 2, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Jan 2, 2025

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Loads the dataset, goes to the last processed sample from prior training, and continues training.

Current behaviour

Loads the dataset as usual, does not resume prior point in training in data loader, and continues training

Steps to reproduce

Start training, stop, resume.

Config yaml

No response

Possible solution

We need to save the dataloader's point and resume that when we resume training.

https://huggingface.co/docs/datasets/v3.2.0/stream#save-a-dataset-checkpoint-and-resume-iteration

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.11

axolotl branch-commit

main

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@NanoCode012 NanoCode012 added the bug Something isn't working label Jan 2, 2025
@NanoCode012 NanoCode012 changed the title [Bug] Resuming training on a pretraining loop does not continue from where it left off [Bug] Resuming training on a pretraining loop does not continue data loading from where it left off Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant