-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hope for a more detailed README! #6
Comments
zhangdan0602 +1 Would appreciate if you could add some instructions to train the models. |
We have updated README.md to illustrate the details. Specifically, In addition, you can download [$D_{V_0}$] and put them in |
Thanks! |
It is difficult for me to match the self-training process in the paper with the files in the code base. Can you give a more detailed training technical report |
@zhangdan0602 |
Sorry to interrupt! I really appreciate your work, but I can't do either inference or self-training based on README.
For inference, I followed the README but failed to run evaluate.py. What are VALUE_BASE_MODEL_DIR and VALUE_MODEL_STATE_DICT? What's more, the model you release on HF, zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st, seems to have some problems. I've tried many times, but it reports an error when loading the checkpoint shard:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Someone also raised this question on HF.
For training, the README introduces nothing about it.
Hope for an update for your README and maybe double-check your HF model. Thank you very much!
The text was updated successfully, but these errors were encountered: