TU Darmstadt / ULB / TUbiblio

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

Schiller, Benjamin ; Daxenberger, Johannes ; Waldis, Andreas ; Gurevych, Iryna (2024)
Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets.
29th Conference on Empirical Methods in Natural Language Processing. Miami, USA (12.11.2024 - 16.11.2024)
doi: 10.18653/v1/2024.emnlp-main.608
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Topic-Dependent Argument Mining (TDAM), that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large TDAM datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of TDAM dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three TDAM tasks on three different datasets. We also publish a new dataset and code for future benchmarking.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Schiller, Benjamin ; Daxenberger, Johannes ; Waldis, Andreas ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets
Sprache: Englisch
Publikationsjahr: November 2024
Verlag: ACL
Buchtitel: EMNLP 2024: The 2024 Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference
Veranstaltungstitel: 29th Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort: Miami, USA
Veranstaltungsdatum: 12.11.2024 - 16.11.2024
DOI: 10.18653/v1/2024.emnlp-main.608
URL / URN: https://aclanthology.org/2024.emnlp-main.608/
Kurzbeschreibung (Abstract):

Topic-Dependent Argument Mining (TDAM), that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large TDAM datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of TDAM dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three TDAM tasks on three different datasets. We also publish a new dataset and code for future benchmarking.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 17 Dez 2024 11:35
Letzte Änderung: 17 Dez 2024 11:35
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen