About the style control leaderboard #50

yangzy39 · 2024-11-06T08:46:33Z

Hello, and thank you for the recent release of the style control leaderboard. I have two questions:

The latest model responses and judgment files seem to be missing from this Hugging Face repository, which prevents us from fully reproducing the leaderboard. Could you clarify if these files will be made available?
When evaluating custom models, we’ve noticed that adding a new model impacts the Style Control Score of all models, leading to inconsistent results across evaluations. Is there a recommended approach for obtaining stable scores when assessing new models?

Thank you for your assistance!

CodingWithTim · 2024-11-06T23:39:15Z

Hello there,

Sorry I just uploaded the newest model answers and judgment to the huggingface repo.
Due to the nature of how style control works, adding a new model will affect the style control score of all models. This is because Style Control is a statistical model which seek to learn the effect of response length and style on the judge's decision conditioned on the dataset in question. Therefore it is also dependent on the dataset. And the effect of response length and style are estimated as logistic coefficients. One way to obtain a more stable score is to add only 1 model at a given time and take the n + 1 model's score as your official score. You can also potentially lock in the coefficients, however, this method is not currently implemented. Additionally, the style control scores, in theory, is actually improving as you add more models into the dataset, since you are giving the statistical model more data to learn the biases in the judge.

Hopefully this helps.

yangzy39 · 2024-11-11T10:51:40Z

Thank you for your response, but we found that the answers and judgment from yi-lightning, gpt-4o-2024-08-06, qwen2.5-72b-instruct and gemma-2-9b-it are still missing.

CodingWithTim · 2024-11-14T09:47:33Z

Thanks you very much for bring this up. Just uploaded all the data to huggingface.

CodingWithTim self-assigned this Nov 6, 2024

CodingWithTim closed this as completed Nov 6, 2024

CodingWithTim added the documentation Improvements or additions to documentation label Dec 14, 2024

CodingWithTim pinned this issue Dec 14, 2024

CodingWithTim mentioned this issue Dec 14, 2024

Conflicting results with the leaderboard. #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the style control leaderboard #50

About the style control leaderboard #50

yangzy39 commented Nov 6, 2024

CodingWithTim commented Nov 6, 2024

yangzy39 commented Nov 11, 2024

CodingWithTim commented Nov 14, 2024

About the style control leaderboard #50

About the style control leaderboard #50

Comments

yangzy39 commented Nov 6, 2024

CodingWithTim commented Nov 6, 2024

yangzy39 commented Nov 11, 2024

CodingWithTim commented Nov 14, 2024