Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Release of Enhanced Theorem-Proving Dataset #2

Open
PrithwishJana opened this issue Aug 21, 2024 · 2 comments
Open

Request for Release of Enhanced Theorem-Proving Dataset #2

PrithwishJana opened this issue Aug 21, 2024 · 2 comments

Comments

@PrithwishJana
Copy link

PrithwishJana commented Aug 21, 2024

Hi,

The paper looks impressive! Is there a plan to release the training dataset? I noticed that you used an enhanced theorem-proving dataset with 9,645k sequences, derived from DeepSeek-Prover-V1. Will the new dataset, including the natural language descriptions and intermediate tactic state information, be made available?

Additionally, I would greatly appreciate it if you could share the smaller dataset of 4.5k carefully selected instances used for reinforcement learning.

@fzyzcjy
Copy link

fzyzcjy commented Aug 28, 2024

+1 Thanks deepseek for the great work, I would appreciate it if the dataset could be open sourced!

@aldopareja
Copy link

would also love seeing this or a way of synthetically generate the data. Thank you! and great work indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants