TU Darmstadt / ULB / TUbiblio

Merging the Trees Building a Morphological Treebank for German from Two Resources

Steiner, Petra (2018)
Merging the Trees Building a Morphological Treebank for German from Two Resources.
16th International Workshop on Treebanks and Linguistic Theories. Prague, Czech Republic (23.-24.01.2018)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

This paper deals with the creation of the first morphological treebank for German by merging two pre-existing linguistic databases. The first of these is the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished and modernized version. The second resource is GermaNet, a lexical-semantic network which also provides partial markup for compounds. We describe the state of the art and the essential characteristics of both databases and our latest revisions. As the merging involves two data sources with distinct annotation schemes, the derivation of the morphological trees for the unified resource is not trivial. We discuss how we overcome problems with the data and format, in particular how we deal with overlaps and complementary scopes. The resulting database comprises about 100,000 trees whose format can be chosen according to the requirements of the application at hand. In our discussion, we show some future directions for morphological treebanks. The Perl script for the generation of the data from the sources will be made publicly available on our website.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2018
Autor(en): Steiner, Petra
Art des Eintrags: Bibliographie
Titel: Merging the Trees Building a Morphological Treebank for German from Two Resources
Sprache: Deutsch
Publikationsjahr: 25 Januar 2018
Verlag: Charles University
Buchtitel: Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
Veranstaltungstitel: 16th International Workshop on Treebanks and Linguistic Theories
Veranstaltungsort: Prague, Czech Republic
Veranstaltungsdatum: 23.-24.01.2018
URL / URN: urn:nbn:de:bsz:mh39-71600
Kurzbeschreibung (Abstract):

This paper deals with the creation of the first morphological treebank for German by merging two pre-existing linguistic databases. The first of these is the linguistic database CELEX which is a standard resource for German morphology. We build on its refurbished and modernized version. The second resource is GermaNet, a lexical-semantic network which also provides partial markup for compounds. We describe the state of the art and the essential characteristics of both databases and our latest revisions. As the merging involves two data sources with distinct annotation schemes, the derivation of the morphological trees for the unified resource is not trivial. We discuss how we overcome problems with the data and format, in particular how we deal with overlaps and complementary scopes. The resulting database comprises about 100,000 trees whose format can be chosen according to the requirements of the application at hand. In our discussion, we show some future directions for morphological treebanks. The Perl script for the generation of the data from the sources will be made publicly available on our website.

Fachbereich(e)/-gebiet(e): Zentrale Einrichtungen
Zentrale Einrichtungen > Universitäts- und Landesbibliothek (ULB)
Hinterlegungsdatum: 20 Jun 2023 11:30
Letzte Änderung: 20 Jun 2023 11:30
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen