Hope for a more detailed README! #6

PKUfreshman · 2024-10-11T02:58:14Z

Sorry to interrupt! I really appreciate your work, but I can't do either inference or self-training based on README.
For inference, I followed the README but failed to run evaluate.py. What are VALUE_BASE_MODEL_DIR and VALUE_MODEL_STATE_DICT? What's more, the model you release on HF, zd21/ReST-MCTS-Llama3-8b-Instruct-Policy-1st, seems to have some problems. I've tried many times, but it reports an error when loading the checkpoint shard:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Someone also raised this question on HF.

For training, the README introduces nothing about it.

Hope for an update for your README and maybe double-check your HF model. Thank you very much!

sarvghotra · 2024-10-11T06:40:02Z

zhangdan0602 +1 Would appreciate if you could add some instructions to train the models.

zhangdan0602 · 2024-10-15T14:04:56Z

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT.
We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

PKUfreshman · 2024-10-20T07:33:31Z

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT. We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

Thanks!

Majiawei · 2024-12-17T08:49:07Z

We have updated README.md to illustrate the details. Specifically, VALUE_BASE_MODEL_DIR is the local path to the value model. Considering the different dependency versions of transformers, Mistral-7B is adopted as the backbone of the value model when the policy model is Llama3-8B-Instruct or MetaMATH: Mistral-7B. When the policy model is SciGLM, we use ChatGLM3-6B as the backbone of the value model.

In addition, you can download [$D_{V_0}$] and put them in PRM/data to train Mistral-7B as the initial process reward model and obtain VALUE_MODEL_STATE_DICT. We also provide PRM/train_VM_chatglm.py and PRM/train_VM_mistral.py.

It is difficult for me to match the self-training process in the paper with the files in the code base. Can you give a more detailed training technical report

Takeshiddd · 2025-01-08T01:15:00Z

@zhangdan0602
Thank you for your excellent work.
I would also like to replicate the self-training using MCTS* as implemented in the paper. I would greatly appreciate it if you could share the code or provide a discription to reproduce ReCT-MCTS* training in the paper.

zhangdan0602 added the about readme Improvements or additions to documentation label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hope for a more detailed README! #6

Hope for a more detailed README! #6

PKUfreshman commented Oct 11, 2024

sarvghotra commented Oct 11, 2024 •

edited

Loading

zhangdan0602 commented Oct 15, 2024

PKUfreshman commented Oct 20, 2024

Majiawei commented Dec 17, 2024

Takeshiddd commented Jan 8, 2025 •

edited

Loading

Hope for a more detailed README! #6

Hope for a more detailed README! #6

Comments

PKUfreshman commented Oct 11, 2024

sarvghotra commented Oct 11, 2024 • edited Loading

zhangdan0602 commented Oct 15, 2024

PKUfreshman commented Oct 20, 2024

Majiawei commented Dec 17, 2024

Takeshiddd commented Jan 8, 2025 • edited Loading

sarvghotra commented Oct 11, 2024 •

edited

Loading

Takeshiddd commented Jan 8, 2025 •

edited

Loading