This repository contains the data collected and crowdsourcing codebase used in What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks? (to appear at ACL-IJCNLP 2021).
data
contains all collected data for our four crowdsourcing protocols.- Each question has human validations and model predictions.
data/intermediate_stages
contains questions collected in the iterative feedback stages of thecrowd
andexpert
protocols.
interface
contains the codebase used in our data collection.- Refer to
interface/README.md
for running the application.
- Refer to
interface_screenshots
contains images showing the user interfaces for the tutorial, writing, and validation tasks used for all four protocols.
The collected data is released under Creative Commons Attribution 4.0 International License.
@inproceedings{nangia-etal-2021-ingredients,
title={What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult {NLU} Data Collection Tasks?},
author={Nikita Nangia, Saku Sugawara, Harsh Trivedi, Alex Warstadt, Clara Vania, Samuel R. Bowman},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing},
month=aug,
year={2021},
address = {Online},
publisher = {Association for Computational Linguistics},