Miller, Tristan (2016)
Adjusting Sense Representations for Word Sense Disambiguation and Automatic Pun Interpretation.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Word sense disambiguation (WSD)—the task of determining which meaning a word carries in a particular context—is a core research problem in computational linguistics. Though it has long been recognized that supervised (machine learning–based) approaches to WSD can yield impressive results, they require an amount of manually annotated training data that is often too expensive or impractical to obtain. This is a particular problem for under-resourced languages and domains, and is also a hurdle in well-resourced languages when processing the sort of lexical-semantic anomalies employed for deliberate effect in humour and wordplay. In contrast to supervised systems are knowledge-based techniques, which rely only on pre-existing lexical-semantic resources (LSRs). These techniques are of more general applicability but tend to suffer from lower performance due to the informational gap between the target word's context and the sense descriptions provided by the LSR.
This dissertation is concerned with extending the efficacy and applicability of knowledge-based word sense disambiguation. First, we investigate two approaches for bridging the information gap and thereby improving the performance of knowledge-based WSD. In the first approach we supplement the word's context and the LSR's sense descriptions with entries from a distributional thesaurus. The second approach enriches an LSR's sense information by aligning it to other, complementary LSRs.
Our next main contribution is to adapt techniques from word sense disambiguation to a novel task: the interpretation of puns. Traditional NLP applications, including WSD, usually treat the source text as carrying a single meaning, and therefore cannot cope with the intentionally ambiguous constructions found in humour and wordplay. We describe how algorithms and evaluation methodologies from traditional word sense disambiguation can be adapted for the "disambiguation" of puns, or rather for the identification of their double meanings.
Finally, we cover the design and construction of technological and linguistic resources aimed at supporting the research and application of word sense disambiguation. Development and comparison of WSD systems has long been hampered by a lack of standardized data formats, language resources, software components, and workflows. To address this issue, we designed and implemented a modular, extensible framework for WSD. It implements, encapsulates, and aggregates reusable, interoperable components using UIMA, an industry-standard information processing architecture. We have also produced two large sense-annotated data sets for under-resourced languages or domains: one of these targets German-language text, and the other English-language puns.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2016 | ||||
Autor(en): | Miller, Tristan | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Adjusting Sense Representations for Word Sense Disambiguation and Automatic Pun Interpretation | ||||
Sprache: | Englisch | ||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Mihalcea, Prof. Dr. Rada ; Balke, Prof. Dr. Wolf-Tilo | ||||
Publikationsjahr: | 4 Januar 2016 | ||||
Ort: | Darmstadt | ||||
Kollation: | xv, 200 pages, ill. | ||||
Datum der mündlichen Prüfung: | 22 März 2016 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/5400 | ||||
Zugehörige Links: | |||||
Kurzbeschreibung (Abstract): | Word sense disambiguation (WSD)—the task of determining which meaning a word carries in a particular context—is a core research problem in computational linguistics. Though it has long been recognized that supervised (machine learning–based) approaches to WSD can yield impressive results, they require an amount of manually annotated training data that is often too expensive or impractical to obtain. This is a particular problem for under-resourced languages and domains, and is also a hurdle in well-resourced languages when processing the sort of lexical-semantic anomalies employed for deliberate effect in humour and wordplay. In contrast to supervised systems are knowledge-based techniques, which rely only on pre-existing lexical-semantic resources (LSRs). These techniques are of more general applicability but tend to suffer from lower performance due to the informational gap between the target word's context and the sense descriptions provided by the LSR. This dissertation is concerned with extending the efficacy and applicability of knowledge-based word sense disambiguation. First, we investigate two approaches for bridging the information gap and thereby improving the performance of knowledge-based WSD. In the first approach we supplement the word's context and the LSR's sense descriptions with entries from a distributional thesaurus. The second approach enriches an LSR's sense information by aligning it to other, complementary LSRs. Our next main contribution is to adapt techniques from word sense disambiguation to a novel task: the interpretation of puns. Traditional NLP applications, including WSD, usually treat the source text as carrying a single meaning, and therefore cannot cope with the intentionally ambiguous constructions found in humour and wordplay. We describe how algorithms and evaluation methodologies from traditional word sense disambiguation can be adapted for the "disambiguation" of puns, or rather for the identification of their double meanings. Finally, we cover the design and construction of technological and linguistic resources aimed at supporting the research and application of word sense disambiguation. Development and comparison of WSD systems has long been hampered by a lack of standardized data formats, language resources, software components, and workflows. To address this issue, we designed and implemented a modular, extensible framework for WSD. It implements, encapsulates, and aggregates reusable, interoperable components using UIMA, an industry-standard information processing architecture. We have also produced two large sense-annotated data sets for under-resourced languages or domains: one of these targets German-language text, and the other English-language puns. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Freie Schlagworte: | word sense disambiguation, puns, word sense alignment, distributional similarity, natural language processing | ||||
URN: | urn:nbn:de:tuda-tuprints-54002 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 400 Sprache > 400 Sprache, Linguistik |
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung DFG-Graduiertenkollegs DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen |
||||
Hinterlegungsdatum: | 17 Apr 2016 19:55 | ||||
Letzte Änderung: | 09 Aug 2024 08:51 | ||||
PPN: | |||||
Referenten: | Gurevych, Prof. Dr. Iryna ; Mihalcea, Prof. Dr. Rada ; Balke, Prof. Dr. Wolf-Tilo | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 22 März 2016 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |