Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry (2024)
BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature.
doi: 10.26083/tuprints-00027396
Report, Erstveröffentlichung, Preprint
Kurzbeschreibung (Abstract)
This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading.
Typ des Eintrags: | Report |
---|---|
Erschienen: | 2024 |
Autor(en): | Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry |
Art des Eintrags: | Erstveröffentlichung |
Titel: | BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature |
Sprache: | Englisch |
Publikationsjahr: | 28 Mai 2024 |
Ort: | Darmstadt |
(Heft-)Nummer: | 1 |
Reihe: | CCLS2024 Conference Preprints |
Band einer Reihe: | 3 |
Kollation: | 34 Seiten |
DOI: | 10.26083/tuprints-00027396 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/27396 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading. |
Freie Schlagworte: | Natural Language Processing, Computational Literary Studies, French Literature, Coreference Resolution, Entity Recognition, Subgenre Classification |
Status: | Preprint |
URN: | urn:nbn:de:tuda-tuprints-273969 |
Zusätzliche Informationen: | This paper has been submitted to the conference track of JCLS. It has been peer reviewed and accepted for presentation and discussion at the 3rd Annual Conference of Computational Literary Studies at Vienna, Austria, in June 2024. |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 800 Literatur > 800 Literatur, Rhetorik, Literaturwissenschaft |
Fachbereich(e)/-gebiet(e): | 02 Fachbereich Gesellschafts- und Geschichtswissenschaften > Institut für Sprach- und Literaturwissenschaft > Digital Philology - Neuere deutsche Literaturwissenschaft 02 Fachbereich Gesellschafts- und Geschichtswissenschaften 02 Fachbereich Gesellschafts- und Geschichtswissenschaften > Institut für Sprach- und Literaturwissenschaft |
Hinterlegungsdatum: | 28 Mai 2024 07:53 |
Letzte Änderung: | 03 Jun 2024 10:40 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |