TU Darmstadt / ULB / TUbiblio

In-tool Learning for Selective Manual Annotation in Large Corpora

Do Dinh, Erik-Lân and Eckart de Castilho, Richard and Gurevych, Iryna (2015):
In-tool Learning for Selective Manual Annotation in Large Corpora.
In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, Association for Computational Linguistics and The Asian Federation of Natural Language Processing, Beijing, China, [Online-Edition: http://www.aclweb.org/anthology/P15-4003],
[Conference or Workshop Item]

Abstract

We present a novel approach to the selective annotation of large corpora through the use of machine learning. Linguistic search engines used to locate potential instances of an infrequent phenomenon do not support ranking of the search results. This favors the use of high-precision queries that return only a few results over broader queries that have a higher recall. Our approach introduces a classifier used to rank the search results and thus helping the annotator focus on those results with the highest potential of being an instance of the phenomenon in question, even in low-precision queries. The classifier is trained in an in-tool fashion, except for preprocessing relying only on the manual annotations done by the users in the querying tool itself. To implement this approach, we build upon an existing web-based multi-user search and annotation tool.

Item Type: Conference or Workshop Item
Erschienen: 2015
Creators: Do Dinh, Erik-Lân and Eckart de Castilho, Richard and Gurevych, Iryna
Title: In-tool Learning for Selective Manual Annotation in Large Corpora
Language: English
Abstract:

We present a novel approach to the selective annotation of large corpora through the use of machine learning. Linguistic search engines used to locate potential instances of an infrequent phenomenon do not support ranking of the search results. This favors the use of high-precision queries that return only a few results over broader queries that have a higher recall. Our approach introduces a classifier used to rank the search results and thus helping the annotator focus on those results with the highest potential of being an instance of the phenomenon in question, even in low-precision queries. The classifier is trained in an in-tool fashion, except for preprocessing relying only on the manual annotations done by the users in the querying tool itself. To implement this approach, we build upon an existing web-based multi-user search and annotation tool.

Title of Book: Proceedings of ACL-IJCNLP 2015 System Demonstrations
Publisher: Association for Computational Linguistics and The Asian Federation of Natural Language Processing
Uncontrolled Keywords: Knowledge Discovery in Scientific Literature;UKP_a_LangTech4eHum;UKP_s_CSniper;UKP_reviewed
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Event Location: Beijing, China
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/P15-4003
Identification Number: TUD-CS-2015-0098
Related URLs:
Export:

Optionen (nur für Redakteure)

View Item View Item