TU Darmstadt / ULB / TUbiblio

All-in Text: Learning Document, Label, and Word Representations Jointly

Nam, Jinseok and Loza Mencía, Eneldo and Fürnkranz, Johannes (2016):
All-in Text: Learning Document, Label, and Word Representations Jointly.
In: Proceedings of the AAAI Conference on Artificial Intelligence, In: Thirtieth AAAI Conference on Artificial Intelligence, [Online-Edition: https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12...],
[Conference or Workshop Item]

Abstract

Conventional multi-label classification algorithms treat the target labels of the classification task as mere symbols that are void of an inherent semantics. However, in many cases textual descriptions of these labels are available or can be easily constructed from public document sources such as Wikipedia. In this paper, we investigate an approach for embedding documents and labels into a joint space while sharing word representations between documents and labels. For finding such embeddings, we rely on the text of documents as well as descriptions for the labels. The use of such label descriptions not only lets us expect an increased performance on conventional multi-label text classification tasks, but can also be used to make predictions for labels that have not been seen during the training phase. The potential of our method is demonstrated on the multi-label classification task of assigning keywords from the Medical Subject Headings (MeSH) to publications in biomedical research, both in a conventional and in a zero-shot learning setting.

Item Type: Conference or Workshop Item
Erschienen: 2016
Creators: Nam, Jinseok and Loza Mencía, Eneldo and Fürnkranz, Johannes
Title: All-in Text: Learning Document, Label, and Word Representations Jointly
Language: English
Abstract:

Conventional multi-label classification algorithms treat the target labels of the classification task as mere symbols that are void of an inherent semantics. However, in many cases textual descriptions of these labels are available or can be easily constructed from public document sources such as Wikipedia. In this paper, we investigate an approach for embedding documents and labels into a joint space while sharing word representations between documents and labels. For finding such embeddings, we rely on the text of documents as well as descriptions for the labels. The use of such label descriptions not only lets us expect an increased performance on conventional multi-label text classification tasks, but can also be used to make predictions for labels that have not been seen during the training phase. The potential of our method is demonstrated on the multi-label classification task of assigning keywords from the Medical Subject Headings (MeSH) to publications in biomedical research, both in a conventional and in a zero-shot learning setting.

Title of Book: Proceedings of the AAAI Conference on Artificial Intelligence
Uncontrolled Keywords: Knowledge Discovery in Scientific Literature
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Knowl­edge En­gi­neer­ing
20 Department of Computer Science > Ubiquitous Knowledge Processing
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Event Title: Thirtieth AAAI Conference on Artificial Intelligence
Date Deposited: 31 Dec 2016 00:25
Official URL: https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12...
Identification Number: TUD-CS-2016-0005
Export:

Optionen (nur für Redakteure)

View Item View Item