TU Darmstadt / ULB / TUbiblio

Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification

Bates, Luke ; Gurevych, Iryna (2024)
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification.
18th Conference of the European Chapter of the Association for Computational Linguistics. St. Julian's, Malta (17.-22.03.2024)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall, 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Inexpensive text classification is important for addressing the problem of domain drift in all classification tasks, and especially in detecting harmful content, which plagues social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was optimized. LaGoNN is effective at flagging undesirable content and text classification, and improves SetFit’s performance. To demonstrate LaGoNN’s value, we conduct a thorough study of text classification systems in the context of content moderation under four label distributions, and in general and multilingual classification settings.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Bates, Luke ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification
Sprache: Englisch
Publikationsjahr: 23 März 2024
Verlag: ACL
Buchtitel: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Veranstaltungstitel: 18th Conference of the European Chapter of the Association for Computational Linguistics
Veranstaltungsort: St. Julian's, Malta
Veranstaltungsdatum: 17.-22.03.2024
URL / URN: https://aclanthology.org/2024.eacl-long.17
Kurzbeschreibung (Abstract):

Few-shot text classification systems have impressive capabilities but are infeasible to deploy and use reliably due to their dependence on prompting and billion-parameter language models. SetFit (Tunstall, 2022) is a recent, practical approach that fine-tunes a Sentence Transformer under a contrastive learning paradigm and achieves similar results to more unwieldy systems. Inexpensive text classification is important for addressing the problem of domain drift in all classification tasks, and especially in detecting harmful content, which plagues social media platforms. Here, we propose Like a Good Nearest Neighbor (LaGoNN), a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor, for example, the label and text, in the training data, making novel data appear similar to an instance on which the model was optimized. LaGoNN is effective at flagging undesirable content and text classification, and improves SetFit’s performance. To demonstrate LaGoNN’s value, we conduct a thorough study of text classification systems in the context of content moderation under four label distributions, and in general and multilingual classification settings.

Freie Schlagworte: UKP_p_seditrah_factcheck
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 12 Apr 2024 11:03
Letzte Änderung: 06 Aug 2024 12:41
PPN: 520386337
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen