Koto, Fajri ; Beck, Tilman ; Talat, Zeerak ; Gurevych, Iryna ; Baldwin, Timothy (2024)
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon.
18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julian's, Malta (17.-22.03.2024)
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT–3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2024 |
Autor(en): | Koto, Fajri ; Beck, Tilman ; Talat, Zeerak ; Gurevych, Iryna ; Baldwin, Timothy |
Art des Eintrags: | Bibliographie |
Titel: | Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon |
Sprache: | Englisch |
Publikationsjahr: | 23 März 2024 |
Verlag: | ACL |
Buchtitel: | Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) |
Veranstaltungstitel: | 18th Conference of the European Chapter of the Association for Computational Linguistics |
Veranstaltungsort: | St. Julian's, Malta |
Veranstaltungsdatum: | 17.-22.03.2024 |
URL / URN: | https://aclanthology.org/2024.eacl-long.18 |
Kurzbeschreibung (Abstract): | Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT–3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 12 Apr 2024 11:01 |
Letzte Änderung: | 06 Aug 2024 12:20 |
PPN: | 520385349 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |