TU Darmstadt / ULB / TUbiblio

High Performance Word Sense Alignment by Joint Modeling of Sense Distance and Gloss Similarity

Matuschek, Michael ; Gurevych, Iryna
Hrsg.: Tsujii, Junichi ; Hajic, Jan (2014)
High Performance Word Sense Alignment by Joint Modeling of Sense Distance and Gloss Similarity.
Dublin, Ireland
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper, we present a machine learning approach for word sense alignment (WSA) which combines distances between senses in the graph representations of lexical-semantic resources with gloss similarities. In this way, we significantly outperform the state of the art on each of the four datasets we consider. Moreover, we present two novel datasets for WSA between Wiktionary and Wikipedia in English and German. The latter dataset in not only of unprecedented size, but also created by the large community of Wiktionary editors instead of expert annotators, making it an interesting subject of study in its own right as the first crowdsourced WSA dataset. We will make both datasets freely available along with our computed alignments.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2014
Herausgeber: Tsujii, Junichi ; Hajic, Jan
Autor(en): Matuschek, Michael ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: High Performance Word Sense Alignment by Joint Modeling of Sense Distance and Gloss Similarity
Sprache: Englisch
Publikationsjahr: August 2014
Verlag: Dublin City University and Association for Computational Linguistics
Buchtitel: Proceedings of the the 25th International Conference on Computational Linguistics (COLING 2014)
Veranstaltungsort: Dublin, Ireland
URL / URN: http://www.aclweb.org/anthology/C14-1025
Kurzbeschreibung (Abstract):

In this paper, we present a machine learning approach for word sense alignment (WSA) which combines distances between senses in the graph representations of lexical-semantic resources with gloss similarities. In this way, we significantly outperform the state of the art on each of the four datasets we consider. Moreover, we present two novel datasets for WSA between Wiktionary and Wikipedia in English and German. The latter dataset in not only of unprecedented size, but also created by the large community of Wiktionary editors instead of expert annotators, making it an interesting subject of study in its own right as the first crowdsourced WSA dataset. We will make both datasets freely available along with our computed alignments.

Freie Schlagworte: UKP_reviewed;UKP_p_UBY;UKP_a_LangTech4eHum
ID-Nummer: TUD-CS-2014-0113
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen