TU Darmstadt / ULB / TUbiblio

Using Semantic Similarity for Multi-Label Zero-Shot Classification of Text Documents

Sappadla, Prateek Veeranna ; Nam, Jinseok ; Loza Mencía, Eneldo ; Fürnkranz, Johannes (2016)
Using Semantic Similarity for Multi-Label Zero-Shot Classification of Text Documents.
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2016
Autor(en): Sappadla, Prateek Veeranna ; Nam, Jinseok ; Loza Mencía, Eneldo ; Fürnkranz, Johannes
Art des Eintrags: Bibliographie
Titel: Using Semantic Similarity for Multi-Label Zero-Shot Classification of Text Documents
Sprache: Deutsch
Publikationsjahr: 2016
Buchtitel: Proceedings of the Tenth International Conference on Language Resources and Evaluation
Kurzbeschreibung (Abstract):

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.

ID-Nummer: TUD-CS-2016-14783
Fachbereich(e)/-gebiet(e): DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum: 30 Nov 2017 17:50
Letzte Änderung: 13 Dez 2018 17:13
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen