TU Darmstadt / ULB / TUbiblio

Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data

Tauchmann, Christopher ; Arnold, Thomas ; Hanselowski, Andreas ; Meyer, Christian M. ; Mieskes, Margot (2018)
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data.
Miyazaki, Japan
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Automatic summarization has so far focused on datasets of ten to twenty rather short documents of mostly news articles. But automatic systems could in theory analyze hundreds of documents from a range of sources and provide an overview to the interested reader. Such a summary would ideally present the most general issues in a specific topic and allow for more in-depth information on specific aspects within said topic. In this paper, we present a new approach for creating hierarchical summarization corpora by first, extracting relevant content from large, heterogeneous document collections using crowdsourcing and second, ordering the relevant information hierarchically by trained annotators. Our resulting corpus can be used to develop and evaluate hierarchical summarization systems.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2018
Autor(en): Tauchmann, Christopher ; Arnold, Thomas ; Hanselowski, Andreas ; Meyer, Christian M. ; Mieskes, Margot
Art des Eintrags: Bibliographie
Titel: Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data
Sprache: Englisch
Publikationsjahr: Mai 2018
Verlag: European Language Resources Association
Buchtitel: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)
Veranstaltungsort: Miyazaki, Japan
URL / URN: http://www.lrec-conf.org/proceedings/lrec2018/summaries/252....
Zugehörige Links:
Kurzbeschreibung (Abstract):

Automatic summarization has so far focused on datasets of ten to twenty rather short documents of mostly news articles. But automatic systems could in theory analyze hundreds of documents from a range of sources and provide an overview to the interested reader. Such a summary would ideally present the most general issues in a specific topic and allow for more in-depth information on specific aspects within said topic. In this paper, we present a new approach for creating hierarchical summarization corpora by first, extracting relevant content from large, heterogeneous document collections using crowdsourcing and second, ordering the relevant information hierarchically by trained annotators. Our resulting corpus can be used to develop and evaluate hierarchical summarization systems.

Freie Schlagworte: reviewed;AIPHES_corpus;AIPHES_area_c1
ID-Nummer: TUD-CS-2018-0007
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum: 14 Dez 2017 14:24
Letzte Änderung: 15 Okt 2018 09:10
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen