TU Darmstadt / ULB / TUbiblio

Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks

Martin, Teresa ; Botschen, Fiete ; Nagesh, Ajay ; McCallum, Andrew (2016)
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks.
San Diego
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

This paper is an attempt to raise pertinent questions and act as platform to generate fruitful discussions within the AKBC community about the need for a large scale dataset for relation extraction. For proper training and evaluation of relation extraction tasks, the weaknesses of datasets used so far need to be tackled: mainly the size (too small) and the amount of data that is actually labelled (unlabelled data leading to recall problems). We have the vision of building a new large and fully labelled dataset for entity pairs connected via binary relations from both Freebase as well as other datasets, such as Clueweb. Concerning the process of building, we present pioneering work on a roadmap which will serve as the foundation for the intended discussion within the community. Points to discuss arise within the following steps: first, the source data has to be preprocessed in order to ensure that the set of relations consists of valid relations only; second, we suggest a method to find the most relevant relations for an entity pair; and third, we outline approaches on how to actually label the data. It is necessary to discuss several key issues in the process of generating this dataset. This will enable us to thoroughly create a dataset that will have the potential to serve as a standard to the community.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2016
Autor(en): Martin, Teresa ; Botschen, Fiete ; Nagesh, Ajay ; McCallum, Andrew
Art des Eintrags: Bibliographie
Titel: Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks
Sprache: Englisch
Publikationsjahr: Juni 2016
Buchtitel: Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 held in conjunction with NAACL 2016
Veranstaltungsort: San Diego
URL / URN: http://www.aclweb.org/anthology/W/W16/W16-1317.pdf
Kurzbeschreibung (Abstract):

This paper is an attempt to raise pertinent questions and act as platform to generate fruitful discussions within the AKBC community about the need for a large scale dataset for relation extraction. For proper training and evaluation of relation extraction tasks, the weaknesses of datasets used so far need to be tackled: mainly the size (too small) and the amount of data that is actually labelled (unlabelled data leading to recall problems). We have the vision of building a new large and fully labelled dataset for entity pairs connected via binary relations from both Freebase as well as other datasets, such as Clueweb. Concerning the process of building, we present pioneering work on a roadmap which will serve as the foundation for the intended discussion within the community. Points to discuss arise within the following steps: first, the source data has to be preprocessed in order to ensure that the set of relations consists of valid relations only; second, we suggest a method to find the most relevant relations for an entity pair; and third, we outline approaches on how to actually label the data. It is necessary to discuss several key issues in the process of generating this dataset. This will enable us to thoroughly create a dataset that will have the potential to serve as a standard to the community.

Freie Schlagworte: reviewed;UKP_reviewed;AIPHES_area_c3
ID-Nummer: TUD-CS-2016-0127
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum: 30 Dez 2016 17:45
Letzte Änderung: 28 Sep 2018 14:54
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen