P. V. S., Avinesh (2020)
Information Preparation with the Human in the Loop.
Technische Universität Darmstadt
doi: 10.25534/tuprints-00011839
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
With the advent of the World Wide Web (WWW) and the rise of digital media consumption, abundant information is available nowadays for any topic. But these days users often suffer from information overload posing a great challenge for finding relevant and important information. To alleviate this information overload and provide significant value to the users, there is a need for automatic information preparation methods. Such methods need to support users by discovering and recommending important information while filtering redundant and irrelevant information. They need to ensure that the users do not drown in, but rather benefit from the prepared information. However, the definition of what is relevant and important is subjective and highly specific to the user’s information need and the task at hand. Therefore, a method must continually learn from the feedback of its users. In this thesis, we propose new approaches to put the human in the loop in order to interactively prepare information along the three major lines of research: information aggregation, condensation, and recommendation.
For multiple well-studied tasks in natural language processing, we point out the limitation of existing methods and discuss how our approach can successfully close the gap to the human upper bound by considering user feedback and adapting to the user’s information need. We put a particular focus on applications in digital journalism and introduce the new task of live blog summarization. We show that the corpora we create for this task are highly heterogeneous as compared to the standard summarization datasets which pose new challenges to previously proposed non-interactive methods.
One way to alleviate information overload is information aggregation. We focus on the corresponding task of multi-document summarization and argue that previously proposed methods are of limited usefulness in the real-world application as they do not take the users’ goal into account. To address these drawbacks, we propose an interactive summarization loop to iteratively create and refine multi-document summaries based on the users’ feedback. We investigate sampling strategies based on active machine learning and joint optimization to reduce the number of iterations and the amount of user feedback required. Our approach significantly improves the quality of the summaries and reaches a performance near the human upper bound. We present a system demonstration implementing the interactive summarization loop, study its scalability, and highlight its use cases in exploring document collections and creating focused summaries in journalism.
For information condensation, we investigate a text compression setup. We address the problem of neural models requiring huge amounts of training data and propose a new interactive text compression method to reduce the need for large-scale annotated data. We employ state-of-the-art Seq2Seq text compression methods as our base models and propose an active learning setup with multiple sampling strategies to efficiently use minimal training data. We find that our method significantly reduces the amount of data needed to train and that it adapts well to new datasets and domains.
We finally focus on information recommendation and discuss the need for explainable models in machine learning. We propose a new joint recommendation system of rating prediction and review summarization, which shows major improvements over state-of-the-art systems in both the rating prediction and the review summarization task. By solving this task jointly based on multi-task learning techniques, we furthermore obtain explanations for a rating by showing the generated review summary marked based on the model’s attention and a histogram of user preferences learned from the reviews of the users.
We conclude the thesis with a summary of how human-in-the-loop approaches improve information preparation systems and envision the use of interactive machine learning methods also for other areas of natural language processing.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2020 | ||||
Autor(en): | P. V. S., Avinesh | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Information Preparation with the Human in the Loop | ||||
Sprache: | Englisch | ||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Sanderson, Prof. Mark | ||||
Publikationsjahr: | 22 Juni 2020 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 18 Juli 2019 | ||||
DOI: | 10.25534/tuprints-00011839 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/11839 | ||||
Kurzbeschreibung (Abstract): | With the advent of the World Wide Web (WWW) and the rise of digital media consumption, abundant information is available nowadays for any topic. But these days users often suffer from information overload posing a great challenge for finding relevant and important information. To alleviate this information overload and provide significant value to the users, there is a need for automatic information preparation methods. Such methods need to support users by discovering and recommending important information while filtering redundant and irrelevant information. They need to ensure that the users do not drown in, but rather benefit from the prepared information. However, the definition of what is relevant and important is subjective and highly specific to the user’s information need and the task at hand. Therefore, a method must continually learn from the feedback of its users. In this thesis, we propose new approaches to put the human in the loop in order to interactively prepare information along the three major lines of research: information aggregation, condensation, and recommendation. For multiple well-studied tasks in natural language processing, we point out the limitation of existing methods and discuss how our approach can successfully close the gap to the human upper bound by considering user feedback and adapting to the user’s information need. We put a particular focus on applications in digital journalism and introduce the new task of live blog summarization. We show that the corpora we create for this task are highly heterogeneous as compared to the standard summarization datasets which pose new challenges to previously proposed non-interactive methods. One way to alleviate information overload is information aggregation. We focus on the corresponding task of multi-document summarization and argue that previously proposed methods are of limited usefulness in the real-world application as they do not take the users’ goal into account. To address these drawbacks, we propose an interactive summarization loop to iteratively create and refine multi-document summaries based on the users’ feedback. We investigate sampling strategies based on active machine learning and joint optimization to reduce the number of iterations and the amount of user feedback required. Our approach significantly improves the quality of the summaries and reaches a performance near the human upper bound. We present a system demonstration implementing the interactive summarization loop, study its scalability, and highlight its use cases in exploring document collections and creating focused summaries in journalism. For information condensation, we investigate a text compression setup. We address the problem of neural models requiring huge amounts of training data and propose a new interactive text compression method to reduce the need for large-scale annotated data. We employ state-of-the-art Seq2Seq text compression methods as our base models and propose an active learning setup with multiple sampling strategies to efficiently use minimal training data. We find that our method significantly reduces the amount of data needed to train and that it adapts well to new datasets and domains. We finally focus on information recommendation and discuss the need for explainable models in machine learning. We propose a new joint recommendation system of rating prediction and review summarization, which shows major improvements over state-of-the-art systems in both the rating prediction and the review summarization task. By solving this task jointly based on multi-task learning techniques, we furthermore obtain explanations for a rating by showing the generated review summary marked based on the model’s attention and a histogram of user preferences learned from the reviews of the users. We conclude the thesis with a summary of how human-in-the-loop approaches improve information preparation systems and envision the use of interactive machine learning methods also for other areas of natural language processing. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-118394 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 400 Sprache > 400 Sprache, Linguistik |
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
||||
Hinterlegungsdatum: | 01 Jul 2020 08:39 | ||||
Letzte Änderung: | 19 Aug 2021 10:50 | ||||
PPN: | |||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Sanderson, Prof. Mark | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 18 Juli 2019 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |