TU Darmstadt / ULB / TUbiblio

UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures

Bär, Daniel ; Biemann, Chris ; Gurevych, Iryna ; Zesch, Torsten (2012)
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures.
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2012
Autor(en): Bär, Daniel ; Biemann, Chris ; Gurevych, Iryna ; Zesch, Torsten
Art des Eintrags: Bibliographie
Titel: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
Sprache: Englisch
Publikationsjahr: Juni 2012
Buchtitel: Proceedings of the 6th International Workshop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computational Semantics
URL / URN: http://www.aclweb.org/anthology/S12-1059
Kurzbeschreibung (Abstract):

We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

Freie Schlagworte: UKP_p_WIKULU;UKP_a_NLP4Wikis;UKP_s_DKPro_Similarity;Statistical Semantics
ID-Nummer: TUD-CS-2012-0089
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Theoretische Informatik - Kryptographie und Computeralgebra
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen