Skip to content
Ben Hachey edited this page Mar 8, 2014 · 2 revisions

The AIDA/CoNLL data is a large whole-document named entity linking data set. The linking annotations were produced by Hoffart et al. (2011) and are freely available from the AIDA downloads page.

They are based on the NER annotation from the CoNLL 2003 shared task (Tjong Kim Sang & De Meulder, 2003), which is freely available from the shared task page.

The source data is from the Reuters news corpora (Lewis et al., 2004), which is freely available from its NIST data page.

Training and development data include stories from 22-31 August 1996 and test data includes stories from 6-7 December 1996. Tjong Kim Sang & De Meulder (2003) tokenised and manually annotated the documents with PER, ORG, LOC and MISC entity mentions.

Hoffart et al. (2011) annotated all named entity mentions with YAGO2 entity annotations, or NIL if there is no corresponding KB entity. These have also been mapped to Wikipedia and Freebase by Massimiliano Ciarmita.

Clone this wiki locally