Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is VALUE_MODEL_STATE_DICT? #5

Open
dszpr opened this issue Oct 11, 2024 · 3 comments
Open

What is VALUE_MODEL_STATE_DICT? #5

dszpr opened this issue Oct 11, 2024 · 3 comments
Labels
about readme Improvements or additions to documentation

Comments

@dszpr
Copy link

dszpr commented Oct 11, 2024

Appreciate for the great work? I tried to run MCTS* search following README and I wonder what VALUE_MODEL_STATE_DICT is.
Besides, I notice that you upload a model on HF, 'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st', is it a inference model or a value model?
Looking forward to your reply!

@zhangdan0602
Copy link
Collaborator

zhangdan0602 commented Oct 15, 2024

  1. You can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
    We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.
  2. 'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st' is an inference model.

@dszpr
Copy link
Author

dszpr commented Oct 21, 2024

Thanks! I noticed that you updated code last week. May I ask what is and where to find these two jsonfile llama_local_critic_dpo.json and mistral_local_critic_dpo.json mentioned in https://github.com/THUDM/ReST-MCTS/blob/main/self_train/self_train_dpo.py

@thunder95
Copy link

Thanks! I noticed that you updated code last week. May I ask what is and where to find these two jsonfile llama_local_critic_dpo.json and mistral_local_critic_dpo.json mentioned in https://github.com/THUDM/ReST-MCTS/blob/main/self_train/self_train_dpo.py

same issue. how to make dpo dataset?

@zhangdan0602 zhangdan0602 added the about readme Improvements or additions to documentation label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
about readme Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants