TU Darmstadt / ULB / TUbiblio

Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations

Rücklé, Andreas ; Eger, Steffen ; Peyrard, Maxime ; Gurevych, Iryna (2018)
Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations.
In: arXiv:1803.01400
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. However, they typically fall short of the performances of more complex models such as InferSent. Here, we generalize the concept of average word embeddings to power mean word embeddings. We show that the concatenation of different types of power mean word embeddings considerably closes the gap to state-of-the-art methods monolingually and substantially outperforms these more complex techniques cross-lingually. In addition, our proposed method outperforms different recently proposed baselines such as SIF and Sent2Vec by a solid margin, thus constituting a much harder-to-beat monolingual baseline.

Typ des Eintrags: Artikel
Erschienen: 2018
Autor(en): Rücklé, Andreas ; Eger, Steffen ; Peyrard, Maxime ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations
Sprache: Englisch
Publikationsjahr: März 2018
Titel der Zeitschrift, Zeitung oder Schriftenreihe: arXiv:1803.01400
URL / URN: https://arxiv.org/abs/1803.01400
Zugehörige Links:
Kurzbeschreibung (Abstract):

Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. However, they typically fall short of the performances of more complex models such as InferSent. Here, we generalize the concept of average word embeddings to power mean word embeddings. We show that the concatenation of different types of power mean word embeddings considerably closes the gap to state-of-the-art methods monolingually and substantially outperforms these more complex techniques cross-lingually. In addition, our proposed method outperforms different recently proposed baselines such as SIF and Sent2Vec by a solid margin, thus constituting a much harder-to-beat monolingual baseline.

Freie Schlagworte: UKP_p_QAEduInf;AIPHES_area_b2
ID-Nummer: TUD-CS-2018-0050
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum: 06 Mär 2018 08:34
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Zugehörige Links:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen