The Lou Dataset - Exploring the Impact of Gender-Fair Language in German Text Classification

Waldis, Andreas ; Birrer, Joel ; Lauscher, Anne ; Gurevych, Iryna (2024)
The Lou Dataset - Exploring the Impact of Gender-Fair Language in German Text Classification.
29th Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-main.592
Konferenzveröffentlichung, Bibliographie

URL / URN: https://aclanthology.org/2024.emnlp-main.592/

Kurzbeschreibung (Abstract)

Gender-fair language, an evolving linguistic variation in German, fosters inclusion by addressing all genders or using neutral forms. However, there is a notable lack of resources to assess the impact of this language shift on language models (LMs) might not been trained on examples of this variation. Addressing this gap, we present Lou, the first dataset providing high-quality reformulations for German text classification covering seven tasks, like stance detection and toxicity classification. We evaluate 16 mono- and multi-lingual LMs and find substantial label flips, reduced prediction certainty, and significantly altered attention patterns. However, existing evaluations remain valid, as LM rankings are consistent across original and reformulated instances. Our study provides initial insights into the impact of gender-fair language on classification for German. However, these findings are likely transferable to other languages, as we found consistent patterns in multi-lingual and English LMs.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2024
Autor(en):	Waldis, Andreas ; Birrer, Joel ; Lauscher, Anne ; Gurevych, Iryna
Art des Eintrags:	Bibliographie
Titel:	The Lou Dataset - Exploring the Impact of Gender-Fair Language in German Text Classification
Sprache:	Englisch
Publikationsjahr:	November 2024
Verlag:	ACL
Buchtitel:	EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference
Veranstaltungstitel:	29th Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort:	Miami, USA
Veranstaltungsdatum:	12.11.2024 - 16.11.2024
DOI:	10.18653/v1/2024.emnlp-main.592
URL / URN:	https://aclanthology.org/2024.emnlp-main.592/
Kurzbeschreibung (Abstract):	Gender-fair language, an evolving linguistic variation in German, fosters inclusion by addressing all genders or using neutral forms. However, there is a notable lack of resources to assess the impact of this language shift on language models (LMs) might not been trained on examples of this variation. Addressing this gap, we present Lou, the first dataset providing high-quality reformulations for German text classification covering seven tasks, like stance detection and toxicity classification. We evaluate 16 mono- and multi-lingual LMs and find substantial label flips, reduced prediction certainty, and significantly altered attention patterns. However, existing evaluations remain valid, as LM rankings are consistent across original and reformulated instances. Our study provides initial insights into the impact of gender-fair language on classification for German. However, these findings are likely transferable to other languages, as we found consistent patterns in multi-lingual and English LMs.
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum:	17 Dez 2024 11:42
Letzte Änderung:	17 Dez 2024 11:42
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung