TU Darmstadt / ULB / TUbiblio

An Inclusive Notion of Text

Kuznetsov, Ilia ; Gurevych, Iryna (2024)
An Inclusive Notion of Text.
The 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada (09.-14.07.2023)
doi: 10.26083/tuprints-00027658
Konferenzveröffentlichung, Zweitveröffentlichung, Verlagsversion

WarnungEs ist eine neuere Version dieses Eintrags verfügbar.

Kurzbeschreibung (Abstract)

Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Kuznetsov, Ilia ; Gurevych, Iryna
Art des Eintrags: Zweitveröffentlichung
Titel: An Inclusive Notion of Text
Sprache: Englisch
Publikationsjahr: 8 Juli 2024
Ort: Darmstadt
Publikationsdatum der Erstveröffentlichung: 2023
Ort der Erstveröffentlichung: Kerrville, TX, USA
Verlag: ACL
Buchtitel: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Veranstaltungstitel: The 61st Annual Meeting of the Association for Computational Linguistics
Veranstaltungsort: Toronto, Canada
Veranstaltungsdatum: 09.-14.07.2023
DOI: 10.26083/tuprints-00027658
URL / URN: https://tuprints.ulb.tu-darmstadt.de/27658
Zugehörige Links:
Herkunft: Zweitveröffentlichungsservice
Kurzbeschreibung (Abstract):

Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion.

ID-Nummer: 2023.acl-long.633
Status: Verlagsversion
URN: urn:nbn:de:tuda-tuprints-276586
Sachgruppe der Dewey Dezimalklassifikatin (DDC): 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 08 Jul 2024 09:23
Letzte Änderung: 09 Jul 2024 09:23
PPN:
Export:
Suche nach Titel in: TUfind oder in Google

Verfügbare Versionen dieses Eintrags

Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen