TU Darmstadt / ULB / TUbiblio

Refurbishing a Morphological Database for German

Steiner, Petra (2016)
Refurbishing a Morphological Database for German.
10th International Conference on Language Resources and Evaluation. Portoroz, Slovenia (23.05.2016-28.05.2016)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as ß. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2016
Autor(en): Steiner, Petra
Art des Eintrags: Bibliographie
Titel: Refurbishing a Morphological Database for German
Sprache: Englisch
Publikationsjahr: 2016
Verlag: European Language Resources Association (ELRA)
Buchtitel: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016
Veranstaltungstitel: 10th International Conference on Language Resources and Evaluation
Veranstaltungsort: Portoroz, Slovenia
Veranstaltungsdatum: 23.05.2016-28.05.2016
URL / URN: https://aclanthology.org/L16-1176
Kurzbeschreibung (Abstract):

The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs such as ß. Changes to a modern version cannot be obtained by simple substitution. In this paper, we shortly describe the original content and form of the orthographic and morphological database for German in CELEX. Then we present our work on modernizing the linguistic data. Lemmas and morphological analyses are transferred to a modern standard of encoding by first merging orthographic and morphological information of the lemmas and their entries and then performing a second substitution for the morphs within their morphological analyses. Changes to modern German spelling are performed by substitution rules according to orthographical standards. We show an example of the use of the data for the disambiguation of morphological structures. The discussion describes prospects of future work on this or similar lexicons. The Perl script is publicly available on our website.

Fachbereich(e)/-gebiet(e): Zentrale Einrichtungen
Zentrale Einrichtungen > Universitäts- und Landesbibliothek (ULB)
Hinterlegungsdatum: 19 Jun 2023 11:56
Letzte Änderung: 19 Jun 2023 11:56
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen