TU Darmstadt / ULB / TUbiblio

Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite

Sukhareva, Maria ; Fuscagni, Francesco ; Daxenberger, Johannes ; Görke, Susanne ; Prechel, Doris ; Gurevych, Iryna (2017)
Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite.
Vancouver, BC, Canada
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

This paper presents a statistical approach to automatic morphosyntactic annotation of Hittite transcripts. Hittite is an extinct Indo-European language using the cuneiform script. There are currently no&nbsp;&nbsp; morphosyntactic annotations available for Hittite, so we explored methods of distant supervision. <br />The annotations were projected from parallel German translations of the Hittite texts. In order to reduce data sparsity, we applied stemming of German and Hittite texts. As there is no off-the-shelf Hittite stemmer, a stemmer for Hittite was developed for this purpose. The resulting annotation projections were used to train a POS tagger, achieving an accuracy of 69\% on a test sample. To our knowledge, this is the first attempt of statistical POS tagging of a cuneiform language.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2017
Autor(en): Sukhareva, Maria ; Fuscagni, Francesco ; Daxenberger, Johannes ; Görke, Susanne ; Prechel, Doris ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Distantly Supervised POS Tagging of Low-Resource Languages under Extreme Data Sparsity: The Case of Hittite
Sprache: Englisch
Publikationsjahr: August 2017
Buchtitel: LaTeCH-CLfL '17 Proceedings of the 11th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Veranstaltungsort: Vancouver, BC, Canada
URL / URN: http://www.aclweb.org/anthology/W17-2213
Zugehörige Links:
Kurzbeschreibung (Abstract):

This paper presents a statistical approach to automatic morphosyntactic annotation of Hittite transcripts. Hittite is an extinct Indo-European language using the cuneiform script. There are currently no&nbsp;&nbsp; morphosyntactic annotations available for Hittite, so we explored methods of distant supervision. <br />The annotations were projected from parallel German translations of the Hittite texts. In order to reduce data sparsity, we applied stemming of German and Hittite texts. As there is no off-the-shelf Hittite stemmer, a stemmer for Hittite was developed for this purpose. The resulting annotation projections were used to train a POS tagger, achieving an accuracy of 69\% on a test sample. To our knowledge, this is the first attempt of statistical POS tagging of a cuneiform language.

Freie Schlagworte: reviewed;CEDIFOR;UKP_reviewed;UKP_s_DKPro_Core;POS tagging, low resource languages, Hittite
ID-Nummer: TUD-CS-2017-0133
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 13 Jun 2017 11:45
Letzte Änderung: 24 Jan 2020 12:03
Projekte: CEDIFOR
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen