TU Darmstadt / ULB / TUbiblio

Augmenting a German Morphological Database by Data-Intense Methods

Steiner, Petra
Hrsg.: Nicolai, Garrett ; Cotterell, Ryan (2019)
Augmenting a German Morphological Database by Data-Intense Methods.
16th Workshop on Computational Research in Phonetics, Phonology, and Morphology. Florence, Italy (02.08.2019)
doi: 10.18653/v1/W19-4221
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2019
Herausgeber: Nicolai, Garrett ; Cotterell, Ryan
Autor(en): Steiner, Petra
Art des Eintrags: Bibliographie
Titel: Augmenting a German Morphological Database by Data-Intense Methods
Sprache: Englisch
Publikationsjahr: 3 August 2019
Verlag: ACL
Buchtitel: SIGMORPHON 2019: The 16th SIGMORPHON Workshop on Computational Research in Phonetics Phonology, and Morphology: Proceedings of the Workshop
Veranstaltungstitel: 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Veranstaltungsort: Florence, Italy
Veranstaltungsdatum: 02.08.2019
DOI: 10.18653/v1/W19-4221
URL / URN: https://aclanthology.org/W19-4221/
Kurzbeschreibung (Abstract):

This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.

Fachbereich(e)/-gebiet(e): Zentrale Einrichtungen
Zentrale Einrichtungen > Universitäts- und Landesbibliothek (ULB)
Hinterlegungsdatum: 19 Jun 2023 09:53
Letzte Änderung: 19 Jun 2023 09:53
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen