TU Darmstadt / ULB / TUbiblio

Counting What Counts: Decompounding for Keyphrase Extraction

Erbs, Nicolai ; Santos, Pedro Bispo ; Zesch, Torsten ; Gurevych, Iryna (2015)
Counting What Counts: Decompounding for Keyphrase Extraction.
Beijing, China
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

A core assumption of keyphrase extraction is that a concept is more important if it is mentioned more often in a document. Especially in languages like German that form large noun compounds, frequency counts might be misleading as concepts “hidden” in compounds are not counted. We hypothesize that using decompounding before counting term frequencies may lead to better keyphrase extraction. We identified two effects of decompounding: (i) enhanced frequency counts, and (ii) more keyphrase candidates. We created two German evaluation datasets to test our hypothesis and analyzed the effect of additional decompounding for keyphrase extraction.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2015
Autor(en): Erbs, Nicolai ; Santos, Pedro Bispo ; Zesch, Torsten ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Counting What Counts: Decompounding for Keyphrase Extraction
Sprache: Englisch
Publikationsjahr: Juli 2015
Verlag: Association for Computational Linguistics
Buchtitel: Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction
Veranstaltungsort: Beijing, China
URL / URN: http://www.aclweb.org/anthology/W15-3603
Kurzbeschreibung (Abstract):

A core assumption of keyphrase extraction is that a concept is more important if it is mentioned more often in a document. Especially in languages like German that form large noun compounds, frequency counts might be misleading as concepts “hidden” in compounds are not counted. We hypothesize that using decompounding before counting term frequencies may lead to better keyphrase extraction. We identified two effects of decompounding: (i) enhanced frequency counts, and (ii) more keyphrase candidates. We created two German evaluation datasets to test our hypothesis and analyzed the effect of additional decompounding for keyphrase extraction.

Freie Schlagworte: UKP_p_WIKULU;UKP_p_DKPro;UKP_a_NLP4Wikis;UKP_reviewed;UKP_s_DKPro_Core
ID-Nummer: TUD-CS-2015-0127
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen