TU Darmstadt / ULB / TUbiblio

UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures

Bär, Daniel and Biemann, Chris and Gurevych, Iryna and Zesch, Torsten (2012):
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures.
In: Proceedings of the 6th International Workshop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computational Semantics, [Online-Edition: http://www.aclweb.org/anthology/S12-1059],
[Conference or Workshop Item]

Abstract

We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

Item Type: Conference or Workshop Item
Erschienen: 2012
Creators: Bär, Daniel and Biemann, Chris and Gurevych, Iryna and Zesch, Torsten
Title: UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
Language: English
Abstract:

We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

Title of Book: Proceedings of the 6th International Workshop on Semantic Evaluation, held in conjunction with the 1st Joint Conference on Lexical and Computational Semantics
Uncontrolled Keywords: UKP_p_WIKULU;UKP_a_NLP4Wikis;UKP_s_DKPro_Similarity;Statistical Semantics
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Theoretical Computer Science - Cryptography and Computer Algebra
20 Department of Computer Science > Ubiquitous Knowledge Processing
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/S12-1059
Identification Number: TUD-CS-2012-0089
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item