Steiner, Petra
Hrsg.: Nicolai, Garrett ; Cotterell, Ryan (2019)
Augmenting a German Morphological Database by Data-Intense Methods.
16th Workshop on Computational Research in Phonetics, Phonology, and Morphology. Florence, Italy (02.08.2019-02.08.2019)
doi: 10.18653/v1/W19-4221
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2019 |
Herausgeber: | Nicolai, Garrett ; Cotterell, Ryan |
Autor(en): | Steiner, Petra |
Art des Eintrags: | Bibliographie |
Titel: | Augmenting a German Morphological Database by Data-Intense Methods |
Sprache: | Englisch |
Publikationsjahr: | 3 August 2019 |
Verlag: | ACL |
Buchtitel: | SIGMORPHON 2019: The 16th SIGMORPHON Workshop on Computational Research in Phonetics Phonology, and Morphology: Proceedings of the Workshop |
Veranstaltungstitel: | 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology |
Veranstaltungsort: | Florence, Italy |
Veranstaltungsdatum: | 02.08.2019-02.08.2019 |
DOI: | 10.18653/v1/W19-4221 |
URL / URN: | https://aclanthology.org/W19-4221/ |
Kurzbeschreibung (Abstract): | This paper deals with the automatic enhancement of a new German morphological database. While there are some databases for flat word segmentation, this is the first available resource which can be directly used for deep parsing of German words. We combine the entries of this morphological database with the morphological tools SMOR and Moremorph and a context-based evaluation method which builds on a large Wikipedia corpus. We describe the state of the art and the essential characteristics of the database and the context method. The approach is tested on an inflight magazine of Lufthansa. We derive over 5,000 new instances of complex words. The coverage for the lemma types reaches up to over 99 percent. The precision of new found complex splits and monomorphemes is between 0.93 and 0.99. |
Fachbereich(e)/-gebiet(e): | Zentrale Einrichtungen Zentrale Einrichtungen > Universitäts- und Landesbibliothek (ULB) |
Hinterlegungsdatum: | 19 Jun 2023 09:53 |
Letzte Änderung: | 19 Jun 2023 09:53 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |