Reimers, Nils Fabian (2018)
Universal Machine Learning Methods for Detecting and Temporal Anchoring of Events.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Event detection has a lot of use-cases, for example summarization, automatic timeline generation or automatic knowledge base population. However, there is no commonly agreed on definition what counts as an event or how events are expressed in text. As a consequence, many different definitions, annotation schemes and corpora have been published, often focusing on specific applications. For a new application, there is a high chance that new data must be annotated and that a machine learning approach must specifically be trained and tuned for this new dataset.
Instead of a system that works well for one specific dataset, we are interested in a universal learning approach that can be used for a wide range of event detection tasks. In this thesis, we analyze an architecture that is based on bidirectional long short-term memory networks (BiLSTM) and conditional random fields (CRF). The BiLSTM-CRF architecture was successfully used by other researchers for sequence tagging tasks and is a strong candidate for the task of event detection. However, besides numerous hyperparameters, researchers have also published various modifications and extensions of this architecture. These parameters and design choices can have a big impact on the performance and selecting them correctly can make the difference between mediocre and state-of-the-art performance. Which parameters and design choices are of relevance is not clear. This leads to a slow adaptation of the approach to new datasets and requires expert experiences and sometimes brute force search to find optimal parameters. This situation is especially unfavorable for event detection where datasets are often application specific.
In order to accelerate the adaptation to new tasks, we provide an extensive evaluation of the BiLSTM-CRF architecture and its individual components and parameters. We identify which parts are relevant for achieving a good performance and which parameters are important to tune for specific tasks. We derive a standard configuration for the architecture that worked well for various tasks. We then show that the BiLSTM-CRF architecture with the proposed default configuration achieves strong results on different event detection tasks.
In most applications, we are not only interested to know that an event happened, but also need to know when it happened. Different methods for annotating temporal information for events have been proposed. In an annotation study we show that the existent annotation schemes have major drawbacks in providing temporal information for events, at least for news articles. Existent schemes provide insufficient temporal information for the majority of events. This is due to the limitation of the annotation scope to only one sentence or two neighboring sentences. As we show in an annotation study, the relevant temporal information for an event can be several sentences apart from the event mention. We developed a new annotation scheme that addresses short-comings of previous schemes and which requires about 85% less annotation effort. Still, it provides better temporal information for events in a document.
While the new scheme requires less human effort, it creates new challenges for automatic event time extraction systems. Existent schemes can be modeled as a pair-wise classification task, but this is no longer possible for the new scheme. Instead, the whole document must be considered and information from different parts of the document must be merged together. We propose an automatic system that uses a decision tree with convolutional neural networks as local classifiers. The neural networks consider the whole document. The final label is derived step-wise, with different branching options. Compared to state-of-the-art systems, the developed architecture significantly improves the accuracy for event time extraction on our annotated data. Further, it generalizes well to other datasets and tasks. Without adaption, it improved the F1-score for the task of automatic event time line generation for the SemEval-2015 Task 4 by 4.01 percentage points.
The final part of the thesis addresses the evaluation of machine learning approaches. Comparing approaches is a major driving force in our research community, which tries to improve the state-of-the-art for tasks of interest. The question arises how reliable our evaluation methods are to spot differences between approaches. We investigate two evaluation setups that are commonly found in scientific publications and which are the de-facto evaluation setups for shared tasks. We show that these setups are unsuitable to compare learning approaches. This introduces a high risk of drawing wrong conclusions. We identify different sources of variation that must be addressed when comparing machine learning approaches and discuss difficulties of addressing those sources of variations.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2018 | ||||
Autor(en): | Reimers, Nils Fabian | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Universal Machine Learning Methods for Detecting and Temporal Anchoring of Events | ||||
Sprache: | Englisch | ||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Weikum, Prof. Dr. Gerhard ; Roth, Prof. Dan | ||||
Publikationsjahr: | 2018 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 3 Mai 2018 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/8163 | ||||
Kurzbeschreibung (Abstract): | Event detection has a lot of use-cases, for example summarization, automatic timeline generation or automatic knowledge base population. However, there is no commonly agreed on definition what counts as an event or how events are expressed in text. As a consequence, many different definitions, annotation schemes and corpora have been published, often focusing on specific applications. For a new application, there is a high chance that new data must be annotated and that a machine learning approach must specifically be trained and tuned for this new dataset. Instead of a system that works well for one specific dataset, we are interested in a universal learning approach that can be used for a wide range of event detection tasks. In this thesis, we analyze an architecture that is based on bidirectional long short-term memory networks (BiLSTM) and conditional random fields (CRF). The BiLSTM-CRF architecture was successfully used by other researchers for sequence tagging tasks and is a strong candidate for the task of event detection. However, besides numerous hyperparameters, researchers have also published various modifications and extensions of this architecture. These parameters and design choices can have a big impact on the performance and selecting them correctly can make the difference between mediocre and state-of-the-art performance. Which parameters and design choices are of relevance is not clear. This leads to a slow adaptation of the approach to new datasets and requires expert experiences and sometimes brute force search to find optimal parameters. This situation is especially unfavorable for event detection where datasets are often application specific. In order to accelerate the adaptation to new tasks, we provide an extensive evaluation of the BiLSTM-CRF architecture and its individual components and parameters. We identify which parts are relevant for achieving a good performance and which parameters are important to tune for specific tasks. We derive a standard configuration for the architecture that worked well for various tasks. We then show that the BiLSTM-CRF architecture with the proposed default configuration achieves strong results on different event detection tasks. In most applications, we are not only interested to know that an event happened, but also need to know when it happened. Different methods for annotating temporal information for events have been proposed. In an annotation study we show that the existent annotation schemes have major drawbacks in providing temporal information for events, at least for news articles. Existent schemes provide insufficient temporal information for the majority of events. This is due to the limitation of the annotation scope to only one sentence or two neighboring sentences. As we show in an annotation study, the relevant temporal information for an event can be several sentences apart from the event mention. We developed a new annotation scheme that addresses short-comings of previous schemes and which requires about 85% less annotation effort. Still, it provides better temporal information for events in a document. While the new scheme requires less human effort, it creates new challenges for automatic event time extraction systems. Existent schemes can be modeled as a pair-wise classification task, but this is no longer possible for the new scheme. Instead, the whole document must be considered and information from different parts of the document must be merged together. We propose an automatic system that uses a decision tree with convolutional neural networks as local classifiers. The neural networks consider the whole document. The final label is derived step-wise, with different branching options. Compared to state-of-the-art systems, the developed architecture significantly improves the accuracy for event time extraction on our annotated data. Further, it generalizes well to other datasets and tasks. Without adaption, it improved the F1-score for the task of automatic event time line generation for the SemEval-2015 Task 4 by 4.01 percentage points. The final part of the thesis addresses the evaluation of machine learning approaches. Comparing approaches is a major driving force in our research community, which tries to improve the state-of-the-art for tasks of interest. The question arises how reliable our evaluation methods are to spot differences between approaches. We investigate two evaluation setups that are commonly found in scientific publications and which are the de-facto evaluation setups for shared tasks. We show that these setups are unsuitable to compare learning approaches. This introduces a high risk of drawing wrong conclusions. We identify different sources of variation that must be addressed when comparing machine learning approaches and discuss difficulties of addressing those sources of variations. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-81634 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
||||
Hinterlegungsdatum: | 23 Dez 2018 20:55 | ||||
Letzte Änderung: | 23 Dez 2018 20:55 | ||||
PPN: | |||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Weikum, Prof. Dr. Gerhard ; Roth, Prof. Dan | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 3 Mai 2018 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |