You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Appreciate for the great work? I tried to run MCTS* search following README and I wonder what VALUE_MODEL_STATE_DICT is.
Besides, I notice that you upload a model on HF, 'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st', is it a inference model or a value model?
Looking forward to your reply!
The text was updated successfully, but these errors were encountered:
You can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.
'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st' is an inference model.
Appreciate for the great work? I tried to run MCTS* search following README and I wonder what VALUE_MODEL_STATE_DICT is.
Besides, I notice that you upload a model on HF, 'zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st', is it a inference model or a value model?
Looking forward to your reply!
The text was updated successfully, but these errors were encountered: