Stance Detection Benchmark: How Robust is Your Stance Detection?

Schiller, Benjamin ; Daxenberger, Johannes ; Gurevych, Iryna (2024)
Stance Detection Benchmark: How Robust is Your Stance Detection?
In: KI - Künstliche Intelligenz : German Journal of Artificial Intelligence, 2021, 35 (3-4)
doi: 10.26083/tuprints-00023506
Artikel, Zweitveröffentlichung, Verlagsversion

Es ist eine neuere Version dieses Eintrags verfügbar.

URL / URN: https://tuprints.ulb.tu-darmstadt.de/23506

Kurzbeschreibung (Abstract)

Stance detection (StD) aims to detect an author’s stance towards a certain topic and has become a key component in applications like fake news detection, claim validation, or argument search. However, while stance is easily detected by humans, machine learning (ML) models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), ML models trained on a single dataset usually generalize poorly to other domains. Hence, we introduce a StD benchmark that allows to compare ML models against a wide variety of heterogeneous StD datasets to evaluate them for generalizability and robustness. Moreover, the framework is designed for easy integration of new datasets and probing methods for robustness. Amongst several baseline models, we define a model that learns from all ten StD datasets of various domains in a multi-dataset learning (MDL) setting and present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple perturbations of the original test samples (adversarial attacks) severely hurt the performance of MDL models. Deeper investigation suggests overfitting on dataset biases as the main reason for the decreased robustness. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. To foster research on this important topic, we release the dataset splits, code, and fine-tuned weights.

Typ des Eintrags:	Artikel
Erschienen:	2024
Autor(en):	Schiller, Benjamin ; Daxenberger, Johannes ; Gurevych, Iryna
Art des Eintrags:	Zweitveröffentlichung
Titel:	Stance Detection Benchmark: How Robust is Your Stance Detection?
Sprache:	Englisch
Publikationsjahr:	2 April 2024
Ort:	Darmstadt
Publikationsdatum der Erstveröffentlichung:	November 2021
Ort der Erstveröffentlichung:	Berlin
Verlag:	Springer
Titel der Zeitschrift, Zeitung oder Schriftenreihe:	KI - Künstliche Intelligenz : German Journal of Artificial Intelligence
Jahrgang/Volume einer Zeitschrift:	35
(Heft-)Nummer:	3-4
DOI:	10.26083/tuprints-00023506
URL / URN:	https://tuprints.ulb.tu-darmstadt.de/23506
Zugehörige Links:	Verlags-DOI
Herkunft:	Zweitveröffentlichung DeepGreen
Kurzbeschreibung (Abstract):	Stance detection (StD) aims to detect an author’s stance towards a certain topic and has become a key component in applications like fake news detection, claim validation, or argument search. However, while stance is easily detected by humans, machine learning (ML) models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), ML models trained on a single dataset usually generalize poorly to other domains. Hence, we introduce a StD benchmark that allows to compare ML models against a wide variety of heterogeneous StD datasets to evaluate them for generalizability and robustness. Moreover, the framework is designed for easy integration of new datasets and probing methods for robustness. Amongst several baseline models, we define a model that learns from all ten StD datasets of various domains in a multi-dataset learning (MDL) setting and present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple perturbations of the original test samples (adversarial attacks) severely hurt the performance of MDL models. Deeper investigation suggests overfitting on dataset biases as the main reason for the decreased robustness. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. To foster research on this important topic, we release the dataset splits, code, and fine-tuned weights.
Freie Schlagworte:	Stance detection, Robustness, Multi-dataset learning
Status:	Verlagsversion
URN:	urn:nbn:de:tuda-tuprints-235062
Zusätzliche Informationen:	NLP and Semantics
Sachgruppe der Dewey Dezimalklassifikatin (DDC):	000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum:	02 Apr 2024 11:20
Letzte Änderung:	03 Apr 2024 05:17
PPN:
Zugehörige Links:	Verlags-DOI
Export:

Suche nach Titel in:	TUfind oder in Google

Verfügbare Versionen dieses Eintrags

Stance Detection Benchmark: How Robust is Your Stance Detection? (deposited 02 Apr 2024 11:20) [Gegenwärtig angezeigt]
- Stance Detection Benchmark: How Robust Is Your Stance Detection? (deposited 02 Mär 2021 09:05)

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung