Bär, Daniel ; Biemann, Chris ; Gurevych, Iryna ; Zesch, Torsten (2012)
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures.
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2012 |
Autor(en): | Bär, Daniel ; Biemann, Chris ; Gurevych, Iryna ; Zesch, Torsten |
Art des Eintrags: | Bibliographie |
Titel: | UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures |
Sprache: | Englisch |
Publikationsjahr: | Juni 2012 |
Buchtitel: | Proceedings of the 6th International Workshop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computational Semantics |
URL / URN: | http://www.aclweb.org/anthology/S12-1059 |
Kurzbeschreibung (Abstract): | We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented. |
Freie Schlagworte: | UKP_p_WIKULU;UKP_a_NLP4Wikis;UKP_s_DKPro_Similarity;Statistical Semantics |
ID-Nummer: | TUD-CS-2012-0089 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Theoretische Informatik - Kryptographie und Computeralgebra 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 31 Dez 2016 14:29 |
Letzte Änderung: | 24 Jan 2020 12:03 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |