Arnold, Thomas Otmar (2018)
Advanced Motif Analysis on Text Induced Graphs.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Motif analysis counts the number of reoccurring patterns (or motifs) in a graph and connects these statistical numbers to the intrinsic semantics of the graph. In this thesis, we will demonstrate the potential of motif analysis on textual data, and introduce new concepts that extend conventional motifs. In particular, we will focus on three main research questions:
1. Can we use graph motifs to assess text quality?
Based on the open encyclopedia Wikipedia, we transform articles of various quality levels into graph structures. There, we find motifs that indicate high or low article quality, and we connect these motifs to linguistic patterns. We also show that a qualitative analysis of the most relevant patterns can yield fruitful insights to our understanding of quality. We then take a look at quality from a very different angle and analyze motifs in the user interaction of collaborative writing communities. These interaction motifs allow us to assess the overall online community success, measured by a combination of growth and user traffic. Certain combinations of user groups show consistent beneficial or detrimental effects on the community performance.
2. How do motifs change over time?
Having established that motif analysis can detect quality on different levels, we now focus at the progression of motifs in dynamic graphs. We take another look at Wikipedia articles, in particular at local text changes in article revisions. To capture patterns in these text revisions, we introduce metamotifs, or motifs of motifs. We also define the novel concept of motif stability - motifs of high stability tend to persist in dynamic graphs, motifs of low stability almost always get changed into other motifs. We present strong correlations between motif stability, established motif characteristics and the quality of the source text.
3. Are metamotifs (motifs of motifs) an improvement over simple motifs and methods?
Finally, we confirm the capabilities of metamotifs, but also quantify their predictive power in a classification experiment of political speeches. To generalize from surface text level, we use semantic frames, which are more abstract than words. With a combination of semantic frames and metamotif analysis on US presidency and German Bundestag data, we confirm that metamotifs outperform traditional motifs and simpler approaches when used as machine learning features.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2018 | ||||
Autor(en): | Arnold, Thomas Otmar | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Advanced Motif Analysis on Text Induced Graphs | ||||
Sprache: | Englisch | ||||
Referenten: | Weihe, Prof. Dr. Karsten ; Gurevych, Prof. Dr. Iryna ; Müller-Hannemann, Prof. Dr. Matthias | ||||
Publikationsjahr: | 30 Mai 2018 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 24 Mai 2018 | ||||
URL / URN: | http://tuprints.ulb.tu-darmstadt.de/7442 | ||||
Kurzbeschreibung (Abstract): | Motif analysis counts the number of reoccurring patterns (or motifs) in a graph and connects these statistical numbers to the intrinsic semantics of the graph. In this thesis, we will demonstrate the potential of motif analysis on textual data, and introduce new concepts that extend conventional motifs. In particular, we will focus on three main research questions: 1. Can we use graph motifs to assess text quality? Based on the open encyclopedia Wikipedia, we transform articles of various quality levels into graph structures. There, we find motifs that indicate high or low article quality, and we connect these motifs to linguistic patterns. We also show that a qualitative analysis of the most relevant patterns can yield fruitful insights to our understanding of quality. We then take a look at quality from a very different angle and analyze motifs in the user interaction of collaborative writing communities. These interaction motifs allow us to assess the overall online community success, measured by a combination of growth and user traffic. Certain combinations of user groups show consistent beneficial or detrimental effects on the community performance. 2. How do motifs change over time? Having established that motif analysis can detect quality on different levels, we now focus at the progression of motifs in dynamic graphs. We take another look at Wikipedia articles, in particular at local text changes in article revisions. To capture patterns in these text revisions, we introduce metamotifs, or motifs of motifs. We also define the novel concept of motif stability - motifs of high stability tend to persist in dynamic graphs, motifs of low stability almost always get changed into other motifs. We present strong correlations between motif stability, established motif characteristics and the quality of the source text. 3. Are metamotifs (motifs of motifs) an improvement over simple motifs and methods? Finally, we confirm the capabilities of metamotifs, but also quantify their predictive power in a classification experiment of political speeches. To generalize from surface text level, we use semantic frames, which are more abstract than words. With a combination of semantic frames and metamotif analysis on US presidency and German Bundestag data, we confirm that metamotifs outperform traditional motifs and simpler approaches when used as machine learning features. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-74428 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 400 Sprache > 400 Sprache, Linguistik |
||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Algorithmik DFG-Graduiertenkollegs DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen |
||||
Hinterlegungsdatum: | 03 Jun 2018 19:55 | ||||
Letzte Änderung: | 03 Jun 2018 19:55 | ||||
PPN: | |||||
Referenten: | Weihe, Prof. Dr. Karsten ; Gurevych, Prof. Dr. Iryna ; Müller-Hannemann, Prof. Dr. Matthias | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 24 Mai 2018 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |