TU Darmstadt / ULB / TUbiblio

Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks

Martin, Teresa and Botschen, Fiete and Nagesh, Ajay and McCallum, Andrew :
Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks.
[Online-Edition: http://www.aclweb.org/anthology/W/W16/W16-1317.pdf]
Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 held in conjunction with NAACL 2016
[Conference or Workshop Item] , (2016)

Official URL: http://www.aclweb.org/anthology/W/W16/W16-1317.pdf

Abstract

This paper is an attempt to raise pertinent questions and act as platform to generate fruitful discussions within the AKBC community about the need for a large scale dataset for relation extraction. For proper training and evaluation of relation extraction tasks, the weaknesses of datasets used so far need to be tackled: mainly the size (too small) and the amount of data that is actually labelled (unlabelled data leading to recall problems). We have the vision of building a new large and fully labelled dataset for entity pairs connected via binary relations from both Freebase as well as other datasets, such as Clueweb. Concerning the process of building, we present pioneering work on a roadmap which will serve as the foundation for the intended discussion within the community. Points to discuss arise within the following steps: first, the source data has to be preprocessed in order to ensure that the set of relations consists of valid relations only; second, we suggest a method to find the most relevant relations for an entity pair; and third, we outline approaches on how to actually label the data. It is necessary to discuss several key issues in the process of generating this dataset. This will enable us to thoroughly create a dataset that will have the potential to serve as a standard to the community.

Item Type: Conference or Workshop Item
Erschienen: 2016
Creators: Martin, Teresa and Botschen, Fiete and Nagesh, Ajay and McCallum, Andrew
Title: Call for Discussion: Building a New Standard Dataset for Relation Extraction Tasks
Language: English
Abstract:

This paper is an attempt to raise pertinent questions and act as platform to generate fruitful discussions within the AKBC community about the need for a large scale dataset for relation extraction. For proper training and evaluation of relation extraction tasks, the weaknesses of datasets used so far need to be tackled: mainly the size (too small) and the amount of data that is actually labelled (unlabelled data leading to recall problems). We have the vision of building a new large and fully labelled dataset for entity pairs connected via binary relations from both Freebase as well as other datasets, such as Clueweb. Concerning the process of building, we present pioneering work on a roadmap which will serve as the foundation for the intended discussion within the community. Points to discuss arise within the following steps: first, the source data has to be preprocessed in order to ensure that the set of relations consists of valid relations only; second, we suggest a method to find the most relevant relations for an entity pair; and third, we outline approaches on how to actually label the data. It is necessary to discuss several key issues in the process of generating this dataset. This will enable us to thoroughly create a dataset that will have the potential to serve as a standard to the community.

Title of Book: Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 held in conjunction with NAACL 2016
Uncontrolled Keywords: reviewed;UKP_reviewed;AIPHES_area_c3
Divisions: Department of Computer Science
Department of Computer Science > Ubiquitous Knowledge Processing
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Event Location: San Diego
Date Deposited: 30 Dec 2016 17:45
Official URL: http://www.aclweb.org/anthology/W/W16/W16-1317.pdf
Identification Number: TUD-CS-2016-0127
Export:

Optionen (nur für Redakteure)

View Item View Item