Reimitz, Tabea Michaela
Hrsg.: Bartsch, Sabine ; Gius, Evelyn ; Müller, Marcus ; Rapp, Andrea ; Weitin, Thomas (2021)
Exploring Experiences of Migration: A Corpus Linguistic Study of Irish Emigrant Letters.
doi: 10.26083/tuprints-00020263
Buch, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Over the last few centuries Ireland has experienced an unparalleled outflow of its inhabitants, the majority of which left their homes in the hope of finding a better life abroad. The personal experiences of these emigrants are most vividly documented in the multitude of letters they sent home to their families and loved ones they left behind.
Following a data-driven approach, the current study aims to explore a collection of these Irish emigrant letters with the help of computer-aided methods. The main focus of the project lies on the thematic investigation of these correspondences, which is carried out with the help of topic modelling. Topic modelling is a fully automated methodology capable of recognising patterns of word co-occurrences in large collections of text. Thus, these algorithms allow us to gain insights into the recurring themes found in large text corpora, which would not be possible to capture manually. However, due to its inherent neglect of contextual information, topic modelling remains a controversial method in the field of corpus linguistics. For this reason, the present project also critically evaluates the validity of the results provided by the algorithm. In addition, this paper serves to outline the various steps involved in the compilation and preprocessing of the letter corpus used, thus providing a comprehensive overview of the entire research process underlying the computer-aided study of text corpora.
The study is based on the electronic Irish Emigrant Letter Corpus (IELC) compiled for this project, which comprises a total of 3,247 Irish emigrant letters covering a period from the 18th to the late 20th century. The compilation, preprocessing and analysis of the corpus is conducted with the help of the programming language Python, which offers a variety of highly efficient libraries and packages for Natural Language Processing tasks. The topic model itself is implemented using McCallum’s (2002) Machine Learning for Language Toolkit (MALLET).
The results obtained indicate that topic modelling is indeed a powerful tool for identifying thematic clusters within large and unstructured collections of texts. Based on the results provided by the algorithm, it is not only possible to delineate different thematic domains within the Irish emigrant letters, but also to derive a proportional distribution of these topics across the entire corpus. However, it becomes clear from the results that the coherence of the individual topics varies greatly. Hence, a deeper look at the texts themselves and at the contexts in which the individual topics occur remains essential in order to be able to draw more detailed conclusions regarding the thematic nature of the letters.
In conclusion, topic modelling proved to be a valuable method for the explorative study of Irish emigrant letters and helped to reveal the rich thematic spectrum covered in these correspondences. Future research projects that intend to conduct computer-aided content analyses of emigrant letters may wish to implement more sophisticated topic models or supplement their findings with more context-sensitive methods in order to enable a deeper thematic exploration of these text sources.
Typ des Eintrags: | Buch |
---|---|
Erschienen: | 2021 |
Herausgeber: | Bartsch, Sabine ; Gius, Evelyn ; Müller, Marcus ; Rapp, Andrea ; Weitin, Thomas |
Autor(en): | Reimitz, Tabea Michaela |
Art des Eintrags: | Erstveröffentlichung |
Titel: | Exploring Experiences of Migration: A Corpus Linguistic Study of Irish Emigrant Letters |
Sprache: | Englisch |
Publikationsjahr: | 2021 |
Ort: | Darmstadt |
Reihe: | Digital Philology | Evolving Scholarship in Digital Philology |
Band einer Reihe: | 3 |
Kollation: | 81 Seiten |
DOI: | 10.26083/tuprints-00020263 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/20263 |
Kurzbeschreibung (Abstract): | Over the last few centuries Ireland has experienced an unparalleled outflow of its inhabitants, the majority of which left their homes in the hope of finding a better life abroad. The personal experiences of these emigrants are most vividly documented in the multitude of letters they sent home to their families and loved ones they left behind. Following a data-driven approach, the current study aims to explore a collection of these Irish emigrant letters with the help of computer-aided methods. The main focus of the project lies on the thematic investigation of these correspondences, which is carried out with the help of topic modelling. Topic modelling is a fully automated methodology capable of recognising patterns of word co-occurrences in large collections of text. Thus, these algorithms allow us to gain insights into the recurring themes found in large text corpora, which would not be possible to capture manually. However, due to its inherent neglect of contextual information, topic modelling remains a controversial method in the field of corpus linguistics. For this reason, the present project also critically evaluates the validity of the results provided by the algorithm. In addition, this paper serves to outline the various steps involved in the compilation and preprocessing of the letter corpus used, thus providing a comprehensive overview of the entire research process underlying the computer-aided study of text corpora. The study is based on the electronic Irish Emigrant Letter Corpus (IELC) compiled for this project, which comprises a total of 3,247 Irish emigrant letters covering a period from the 18th to the late 20th century. The compilation, preprocessing and analysis of the corpus is conducted with the help of the programming language Python, which offers a variety of highly efficient libraries and packages for Natural Language Processing tasks. The topic model itself is implemented using McCallum’s (2002) Machine Learning for Language Toolkit (MALLET). The results obtained indicate that topic modelling is indeed a powerful tool for identifying thematic clusters within large and unstructured collections of texts. Based on the results provided by the algorithm, it is not only possible to delineate different thematic domains within the Irish emigrant letters, but also to derive a proportional distribution of these topics across the entire corpus. However, it becomes clear from the results that the coherence of the individual topics varies greatly. Hence, a deeper look at the texts themselves and at the contexts in which the individual topics occur remains essential in order to be able to draw more detailed conclusions regarding the thematic nature of the letters. In conclusion, topic modelling proved to be a valuable method for the explorative study of Irish emigrant letters and helped to reveal the rich thematic spectrum covered in these correspondences. Future research projects that intend to conduct computer-aided content analyses of emigrant letters may wish to implement more sophisticated topic models or supplement their findings with more context-sensitive methods in order to enable a deeper thematic exploration of these text sources. |
Status: | Verlagsversion |
URN: | urn:nbn:de:tuda-tuprints-202637 |
Zusätzliche Informationen: | Keywords: migration, letters, topic modelling, corpus linguistics, Irish emigrant letters, Python, MALLET |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 000 Allgemeines, Wissenschaft 400 Sprache > 400 Sprache, Linguistik 400 Sprache > 420 Englisch 400 Sprache > 430 Deutsch 800 Literatur > 800 Literatur, Rhetorik, Literaturwissenschaft 800 Literatur > 820 Englische Literatur 800 Literatur > 830 Deutsche Literatur |
Fachbereich(e)/-gebiet(e): | 02 Fachbereich Gesellschafts- und Geschichtswissenschaften 02 Fachbereich Gesellschafts- und Geschichtswissenschaften > Institut für Sprach- und Literaturwissenschaft |
Hinterlegungsdatum: | 23 Dez 2021 09:31 |
Letzte Änderung: | 06 Jan 2022 10:13 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |