Significance testing for different portions of gold #49

jnothman · 2014-06-12T03:10:53Z

Current significance testing only allow us to compare two outputs on the same gold. It would be good to have a bootstrap resampling version that, if I understand correctly, can tell us whether a system is substantially better on one dataset than another. (i.e. we could evaluate whether the sample of system scores on the politics domain seems to be drawn from the same population of a sample of system scores on the sports domain.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significance testing for different portions of gold #49

Significance testing for different portions of gold #49

jnothman commented Jun 12, 2014

Significance testing for different portions of gold #49

Significance testing for different portions of gold #49

Comments

jnothman commented Jun 12, 2014