Kuznetsov, Ilia ; Gurevych, Iryna (2023)
An Inclusive Notion of Text.
61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada (09.07.2023-14.07.2023)
Konferenzveröffentlichung, Bibliographie
Dies ist die neueste Version dieses Eintrags.
Kurzbeschreibung (Abstract)
Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2023 |
Autor(en): | Kuznetsov, Ilia ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | An Inclusive Notion of Text |
Sprache: | Englisch |
Publikationsjahr: | 10 Juli 2023 |
Verlag: | ACL |
Buchtitel: | The 61st Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference Volume 1: Long Papers |
Veranstaltungstitel: | 61st Annual Meeting of the Association for Computational Linguistics |
Veranstaltungsort: | Toronto, Canada |
Veranstaltungsdatum: | 09.07.2023-14.07.2023 |
URL / URN: | https://aclanthology.org/2023.acl-long.633/ |
Kurzbeschreibung (Abstract): | Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion. |
Freie Schlagworte: | UKP_p_LOEWE_Spitzenprofessur, UKP_p_InterText |
Zusätzliche Informationen: | Erstveröffentlichung |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 09 Aug 2023 09:39 |
Letzte Änderung: | 09 Jul 2024 09:23 |
PPN: | 510577407 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Verfügbare Versionen dieses Eintrags
-
An Inclusive Notion of Text. (deposited 08 Jul 2024 09:23)
- An Inclusive Notion of Text. (deposited 09 Aug 2023 09:39) [Gegenwärtig angezeigt]
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |