You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current significance testing only allow us to compare two outputs on the same gold. It would be good to have a bootstrap resampling version that, if I understand correctly, can tell us whether a system is substantially better on one dataset than another. (i.e. we could evaluate whether the sample of system scores on the politics domain seems to be drawn from the same population of a sample of system scores on the sports domain.)
The text was updated successfully, but these errors were encountered:
Current significance testing only allow us to compare two outputs on the same gold. It would be good to have a bootstrap resampling version that, if I understand correctly, can tell us whether a system is substantially better on one dataset than another. (i.e. we could evaluate whether the sample of system scores on the politics domain seems to be drawn from the same population of a sample of system scores on the sports domain.)
The text was updated successfully, but these errors were encountered: