TU Darmstadt / ULB / TUbiblio

Combining Topic Models for Corpus Exploration: Applying LDA for Complex Corpus Research Tasks in a Digital Humanities Project

Schnober, Carsten ; Gurevych, Iryna (2015)
Combining Topic Models for Corpus Exploration: Applying LDA for Complex Corpus Research Tasks in a Digital Humanities Project.
New York, NY, USA
doi: 10.1145/2809936.2809939
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

We investigate new ways of applying LDA topic models: rather than optimizing a single model for a specific use case, we train multiple models based on different parameters and vocabularies which are combined on-the-fly to comply with varying information retrieval tasks. We also show a semi-automatic method which helps users to identify relevant topics across multiple models. Our methods are demonstrated and evaluated on a real-world use case: a large-scale corpus-based digital humanities project called Welt der Kinder (“Children and their World”). We illustrate our approach in that context and show that it can be generalized to other scenarios. We evaluate this work using empirical methods from information retrieval, but also show visualizations and use cases as actually applied in the project.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2015
Autor(en): Schnober, Carsten ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Combining Topic Models for Corpus Exploration: Applying LDA for Complex Corpus Research Tasks in a Digital Humanities Project
Sprache: Englisch
Publikationsjahr: Oktober 2015
Verlag: Sheridan Communications
Buchtitel: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications
Reihe: TM'15
Veranstaltungsort: New York, NY, USA
DOI: 10.1145/2809936.2809939
URL / URN: http://doi.acm.org/10.1145/2809936.2809939
Zugehörige Links:
Kurzbeschreibung (Abstract):

We investigate new ways of applying LDA topic models: rather than optimizing a single model for a specific use case, we train multiple models based on different parameters and vocabularies which are combined on-the-fly to comply with varying information retrieval tasks. We also show a semi-automatic method which helps users to identify relevant topics across multiple models. Our methods are demonstrated and evaluated on a real-world use case: a large-scale corpus-based digital humanities project called Welt der Kinder (“Children and their World”). We illustrate our approach in that context and show that it can be generalized to other scenarios. We evaluate this work using empirical methods from information retrieval, but also show visualizations and use cases as actually applied in the project.

Freie Schlagworte: UKP_p_WeltDerKinder;UKP_reviewed;Semantic Information Management;Digital Humanities, Topic Models, Information Retrieval
ID-Nummer: TUD-CS-2015-1197
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen