Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hope for a more detailed README! #6

Open
PKUfreshman opened this issue Oct 11, 2024 · 5 comments
Open

Hope for a more detailed README! #6

PKUfreshman opened this issue Oct 11, 2024 · 5 comments
Labels
about readme Improvements or additions to documentation

Comments

@PKUfreshman
Copy link

Sorry to interrupt! I really appreciate your work, but I can't do either inference or self-training based on README.
For inference, I followed the README but failed to run evaluate.py. What are VALUE_BASE_MODEL_DIR and VALUE_MODEL_STATE_DICT? What's more, the model you release on HF, zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st, seems to have some problems. I've tried many times, but it reports an error when loading the checkpoint shard:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Someone also raised this question on HF.

For training, the README introduces nothing about it.

Hope for an update for your README and maybe double-check your HF model. Thank you very much!

@sarvghotra
Copy link

sarvghotra commented Oct 11, 2024

zhangdan0602 +1 Would appreciate if you could add some instructions to train the models.

@zhangdan0602
Copy link
Collaborator

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

@PKUfreshman
Copy link
Author

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT. We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

Thanks!

@Majiawei
Copy link

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT. We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

It is difficult for me to match the self-training process in the paper with the files in the code base. Can you give a more detailed training technical report

@zhangdan0602 zhangdan0602 added the about readme Improvements or additions to documentation label Dec 25, 2024
@Takeshiddd
Copy link

Takeshiddd commented Jan 8, 2025

@zhangdan0602
Thank you for your excellent work.
I would also like to replicate the self-training using MCTS* as implemented in the paper. I would greatly appreciate it if you could share the code or provide a discription to reproduce ReCT-MCTS* training in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
about readme Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants