You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 27, 2023. It is now read-only.
Given two model names (could be the same, e.g. distilgpt2), load them and use them each to "power" one of two parties engaged in debate, using the Debate object. It might require a bit of messing around with the Debate object, though, becuase it has been designed with one model in mind. For instance, the new function could work with two such objects which are manually kept in sync, each powered by one of the model names.
The function should return a list of the party ratings for each of n_branches debates, something like [[0.4, 0.6], [0.7, 0.3]]. It'll be straightforward to then interpret those in a more meaningful way. I think it'd be appropriate to also sanitize the scores, as described in the artifact (i.e. setting individual utterance ratings to zero if they fail to satisfy a few cosmetic constraints).
Implementing the sanitization function caused the scores of many propositions to drop to zero, including many which are mostly well-formatted but contain extraneous punctuation such as colons and quotation marks. This results in the sum of the scores of the two parties adding to less than one, and in many cases both parties got a final score of zero. I am wondering if it would make sense to relax the sanitization requirements, for example by allowing more types of punctuation, then re-normalizing the scores.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Given two model names (could be the same, e.g.
distilgpt2
), load them and use them each to "power" one of two parties engaged in debate, using theDebate
object. It might require a bit of messing around with theDebate
object, though, becuase it has been designed with one model in mind. For instance, the new function could work with two such objects which are manually kept in sync, each powered by one of the model names.The function should return a list of the party ratings for each of n_branches debates, something like
[[0.4, 0.6], [0.7, 0.3]]
. It'll be straightforward to then interpret those in a more meaningful way. I think it'd be appropriate to also sanitize the scores, as described in the artifact (i.e. setting individual utterance ratings to zero if they fail to satisfy a few cosmetic constraints).Relevant artifact sections: ArgRank, Obtaining DebateGPT
The text was updated successfully, but these errors were encountered: