TU Darmstadt / ULB / TUbiblio

BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature

Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry (2024)
BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature.
doi: 10.26083/tuprints-00027396
Report, Erstveröffentlichung, Preprint

Kurzbeschreibung (Abstract)

This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading.

Typ des Eintrags: Report
Erschienen: 2024
Autor(en): Mélanie-Becquet, Frédérique ; Barré, Jean ; Seminck, Olga ; Plancq, Clément ; Naguib, Marco ; Pastor, Martial ; Poibeau, Thierry
Art des Eintrags: Erstveröffentlichung
Titel: BookNLP-fr, the French Versant of BookNLP. A Tailored Pipeline for 19th and 20th Century French Literature
Sprache: Englisch
Publikationsjahr: 28 Mai 2024
Ort: Darmstadt
(Heft-)Nummer: 1
Reihe: CCLS2024 Conference Preprints
Band einer Reihe: 3
Kollation: 34 Seiten
DOI: 10.26083/tuprints-00027396
URL / URN: https://tuprints.ulb.tu-darmstadt.de/27396
Zugehörige Links:
Kurzbeschreibung (Abstract):

This paper presents BookNLP-fr: the adaptation to French of BookNLP, an existing NLP pipeline tailored for literary texts in English. We provide an overview of the challenges involved in the adaptation of such a pipeline to a new language: from the challenges related to data annotation up to the development of specialized modules of entity recognition and coreference. Moving beyond the technical aspects, we explore practical applications of BookNLP-fr with a canonical task for computational literary studies: subgenre classification. We show that BookNLP-fr provides more relevant and – even more importantly – more interpretable features to perform automatic subgenre classification than the traditional bag-of-words approach. BookNLP-fr makes NLP techniques available to a larger public and constitutes a new toolkit to process large numbers of digitized books in French. This allows the field to gain a deeper literary understanding through the practice of distant reading.

Freie Schlagworte: Natural Language Processing, Computational Literary Studies, French Literature, Coreference Resolution, Entity Recognition, Subgenre Classification
Status: Preprint
URN: urn:nbn:de:tuda-tuprints-273969
Zusätzliche Informationen:

This paper has been submitted to the conference track of JCLS. It has been peer reviewed and accepted for presentation and discussion at the 3rd Annual Conference of Computational Literary Studies at Vienna, Austria, in June 2024.

Sachgruppe der Dewey Dezimalklassifikatin (DDC): 800 Literatur > 800 Literatur, Rhetorik, Literaturwissenschaft
Fachbereich(e)/-gebiet(e): 02 Fachbereich Gesellschafts- und Geschichtswissenschaften > Institut für Sprach- und Literaturwissenschaft > Digital Philology - Neuere deutsche Literaturwissenschaft
02 Fachbereich Gesellschafts- und Geschichtswissenschaften
02 Fachbereich Gesellschafts- und Geschichtswissenschaften > Institut für Sprach- und Literaturwissenschaft
Hinterlegungsdatum: 28 Mai 2024 07:53
Letzte Änderung: 03 Jun 2024 10:40
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen