TU Darmstadt / ULB / TUbiblio

Medical Concept Embeddings via Labeled Background Corpora

Loza Mencía, Eneldo and de Melo, Gerard and Nam, Jinseok (2016):
Medical Concept Embeddings via Labeled Background Corpora.
In: Proceedings of the Tenth International Conference on Language Resources and Evaluation, [Conference or Workshop Item]

Abstract

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.

Item Type: Conference or Workshop Item
Erschienen: 2016
Creators: Loza Mencía, Eneldo and de Melo, Gerard and Nam, Jinseok
Title: Medical Concept Embeddings via Labeled Background Corpora
Language: German
Abstract:

In recent years, we have seen an increasing amount of interest in low-dimensional vector representations of words. Among other things, these facilitate computing word similarity and relatedness scores. The most well-known example of algorithms to produce representations of this sort are the word2vec approaches. In this paper, we investigate a new model to induce such vector spaces for medical concepts, based on a joint objective that exploits not only word co-occurrences but also manually labeled documents, as available from sources such as PubMed. Our extensive experimental analysis shows that our embeddings lead to significantly higher correlations with human similarity and relatedness assessments than previous work. Due to the simplicity and versatility of vector representations, these findings suggest that our resource can easily be used as a drop-in replacement to improve any systems relying on medical concept similarity measures.

Title of Book: Proceedings of the Tenth International Conference on Language Resources and Evaluation
Divisions: DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Date Deposited: 30 Nov 2017 17:41
Identification Number: TUD-CS-2016-14782
Export:

Optionen (nur für Redakteure)

View Item View Item