TU Darmstadt / ULB / TUbiblio

Interactive Summarization of Large Document Collections

Hättasch, Benjamin ; Meyer, Christian M. ; Binnig, Carsten (2019)
Interactive Summarization of Large Document Collections.
Workshop on Human-In-the-Loop Data Analytics. Amsterdam (05.07.2019-05.07.2019)
doi: 10.1145/3328519.3329129
Conference or Workshop Item, Bibliographie

Abstract

We present a new system for custom summarizations of large text corpora at interactive speed. The task of producing textual summaries is an important step to understand large collections of topicrelated documents and has many real-world applications in journalism, medicine, and many more. Key to our system is that the summarization model is refined by user feedback and called multiple times to improve the quality of the summarization iteratively. To that end, the human is brought into the loop to gather feedback in every iteration about which aspects of the intermediate summaries satisfy their individual information needs. Our system consists of a sampling component and a learned model to produce a textual summary. As we show in our evaluation, our system can provide a similar quality level as existing summarization models that are working on the full corpus and hence cannot provide interactive speeds.

Item Type: Conference or Workshop Item
Erschienen: 2019
Creators: Hättasch, Benjamin ; Meyer, Christian M. ; Binnig, Carsten
Type of entry: Bibliographie
Title: Interactive Summarization of Large Document Collections
Language: English
Date: July 2019
Place of Publication: Amsterdam, Niederlande
Book Title: HILDA'19: Proceedings of the ...
Event Title: Workshop on Human-In-the-Loop Data Analytics
Event Location: Amsterdam
Event Dates: 05.07.2019-05.07.2019
DOI: 10.1145/3328519.3329129
URL / URN: https://hilda.io/2019/proceedings/HILDA2019_paper_4.pdf
Abstract:

We present a new system for custom summarizations of large text corpora at interactive speed. The task of producing textual summaries is an important step to understand large collections of topicrelated documents and has many real-world applications in journalism, medicine, and many more. Key to our system is that the summarization model is refined by user feedback and called multiple times to improve the quality of the summarization iteratively. To that end, the human is brought into the loop to gather feedback in every iteration about which aspects of the intermediate summaries satisfy their individual information needs. Our system consists of a sampling component and a learned model to produce a textual summary. As we show in our evaluation, our system can provide a similar quality level as existing summarization models that are working on the full corpus and hence cannot provide interactive speeds.

Uncontrolled Keywords: Text Summarization, Machine Learning, Approximate Computing, AIPHES_area_d2, dm, dm_vi_ml, dm_sherlock
Additional Information:

Article No 9

Divisions: 20 Department of Computer Science
20 Department of Computer Science > Data Management (2022 umbenannt in Data and AI Systems)
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Date Deposited: 26 Apr 2019 13:27
Last Modified: 22 Apr 2020 07:41
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details