TU Darmstadt / ULB / TUbiblio

Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

Bayer, Markus ; Kaufhold, Marc-André ; Buchhold, Björn ; Keller, Marcel ; Dallmeyer, Jörg ; Reuter, Christian (2021)
Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers.
doi: 10.48550/arXiv.2103.14453
Report, Bibliographie

Kurzbeschreibung (Abstract)

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another data augmentation technique. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1-score). Since we are evaluating the method from many perspectives (in total 11 datasets), we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.

Typ des Eintrags: Report
Erschienen: 2021
Autor(en): Bayer, Markus ; Kaufhold, Marc-André ; Buchhold, Björn ; Keller, Marcel ; Dallmeyer, Jörg ; Reuter, Christian
Art des Eintrags: Bibliographie
Titel: Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers
Sprache: Englisch
Publikationsjahr: 26 März 2021
Verlag: arXiv
Reihe: Computation and Language
Kollation: 17 Seiten
DOI: 10.48550/arXiv.2103.14453
URL / URN: https://arxiv.org/abs/2103.14453
Zugehörige Links:
Kurzbeschreibung (Abstract):

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another data augmentation technique. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1-score). Since we are evaluating the method from many perspectives (in total 11 datasets), we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.

Zusätzliche Informationen:

1.Version

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Wissenschaft und Technik für Frieden und Sicherheit (PEASEC)
Forschungsfelder
Forschungsfelder > Information and Intelligence
Forschungsfelder > Information and Intelligence > Cybersecurity & Privacy
Hinterlegungsdatum: 15 Nov 2021 10:31
Letzte Änderung: 19 Dez 2024 10:50
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen