TU Darmstadt / ULB / TUbiblio

Using compound lists for German decompounding in a back-off scenario

Santos, Pedro Bispo
Henrich, Verena and Hinrichs, Erhard (eds.) (2014):
Using compound lists for German decompounding in a back-off scenario.
In: Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014), Department of Linguistics (SfS), University of Tübingen and Collaborative Research Center: Emergence of Meaning (SFB 833), University of Tübingen, Tuebingen, Germany, [Online-Edition: http://www.sfs.uni-tuebingen.de/~vhenrich/cclcc_2014/CCLCC_2...],
[Conference or Workshop Item]

Abstract

Lexical resources like GermaNet offer compound lists of reasonable size. These lists can be used as a prior step to existing decompounding algorithms, wherein decompounding algorithms would function as a back-off mechanism. We investigate whether the use of compound lists can enhance dictionary and corpus-based decompounding algorithms. We analyze the effect of using an initial decompounding step based on a compound list derived from GermaNet with a gold standard in German. The obtained results show that applying information from GermaNet can significantly improve all tested decompounding approaches across all metrics. Precision and recall increases statistically significant by .004-.018 and .011- .022 respectively.

Item Type: Conference or Workshop Item
Erschienen: 2014
Editors: Henrich, Verena and Hinrichs, Erhard
Creators: Santos, Pedro Bispo
Title: Using compound lists for German decompounding in a back-off scenario
Language: English
Abstract:

Lexical resources like GermaNet offer compound lists of reasonable size. These lists can be used as a prior step to existing decompounding algorithms, wherein decompounding algorithms would function as a back-off mechanism. We investigate whether the use of compound lists can enhance dictionary and corpus-based decompounding algorithms. We analyze the effect of using an initial decompounding step based on a compound list derived from GermaNet with a gold standard in German. The obtained results show that applying information from GermaNet can significantly improve all tested decompounding approaches across all metrics. Precision and recall increases statistically significant by .004-.018 and .011- .022 respectively.

Title of Book: Workshop on Computational, Cognitive, and Linguistic Approaches to the Analysis of Complex Words and Collocations (CCLCC 2014)
Publisher: Department of Linguistics (SfS), University of Tübingen and Collaborative Research Center: Emergence of Meaning (SFB 833), University of Tübingen
Uncontrolled Keywords: UKP_a_ENLP;UKP_a_NLP4Wikis
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Tuebingen, Germany
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.sfs.uni-tuebingen.de/~vhenrich/cclcc_2014/CCLCC_2...
Identification Number: TUD-CS-2014-0105
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item