Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the MMLU evaluation benchmark dataset to the minimum set of features #183

Open
markmc opened this issue Jul 23, 2024 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@markmc
Copy link
Contributor

markmc commented Jul 23, 2024

In #180 we believe that the MMLU evaluation benchmark dataset only needs question, choices and answer features because that is all the lm-eval task will use:

doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: answer

We should update the generator to remove any other features but these, to make the requirements of the evaluation process more clear and to reduce the size of the dataset.

@markmc markmc changed the title Reduce the MMLU evaluation benchmark dataset to the minimum set of columns Reduce the MMLU evaluation benchmark dataset to the minimum set of features Jul 23, 2024
@nathan-weinberg nathan-weinberg added the enhancement New feature or request label Aug 20, 2024
Copy link

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Nov 20, 2024
@bbrowning
Copy link
Contributor

This still needs to be looked into, and verify that with the current state of eval we don't need any features of this dataset.

@github-actions github-actions bot removed the stale label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants