What is VALUE_MODEL_STATE_DICT? #5

dszpr · 2024-10-11T01:26:27Z

Appreciate for the great work? I tried to run MCTS* search following README and I wonder what VALUE_MODEL_STATE_DICT is.
Besides, I notice that you upload a model on HF, 'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st', is it a inference model or a value model?
Looking forward to your reply!

zhangdan0602 · 2024-10-15T05:33:56Z

You can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.
'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st' is an inference model.

dszpr · 2024-10-21T03:02:26Z

Thanks! I noticed that you updated code last week. May I ask what is and where to find these two jsonfile llama_local_critic_dpo.json and mistral_local_critic_dpo.json mentioned in https://github.com/THUDM/ReST-MCTS/blob/main/self_train/self_train_dpo.py

thunder95 · 2024-10-22T08:46:31Z

Thanks! I noticed that you updated code last week. May I ask what is and where to find these two jsonfile llama_local_critic_dpo.json and mistral_local_critic_dpo.json mentioned in https://github.com/THUDM/ReST-MCTS/blob/main/self_train/self_train_dpo.py

same issue. how to make dpo dataset?

zhangdan0602 added the about readme Improvements or additions to documentation label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is VALUE_MODEL_STATE_DICT? #5

What is VALUE_MODEL_STATE_DICT? #5

dszpr commented Oct 11, 2024

zhangdan0602 commented Oct 15, 2024 •

edited

Loading

dszpr commented Oct 21, 2024

thunder95 commented Oct 22, 2024

What is VALUE_MODEL_STATE_DICT? #5

What is VALUE_MODEL_STATE_DICT? #5

Comments

dszpr commented Oct 11, 2024

zhangdan0602 commented Oct 15, 2024 • edited Loading

dszpr commented Oct 21, 2024

thunder95 commented Oct 22, 2024

zhangdan0602 commented Oct 15, 2024 •

edited

Loading