Vogel, Liane ; Flek, Lucie
Hrsg.: Sojka, Petr ; Horak, Ales ; Kopecek, Ivan ; Pala, Karel (2022)
Investigating Paraphrasing-Based Data Augmentation for Task-Oriented Dialogue Systems.
25th International Conference on Text, Speech, and Dialogue. Brno, Czech Republic (06.09.2022-09.09.2022)
doi: 10.1007/978-3-031-16270-1_39
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
With synthetic data generation, the required amount of human-generated training data can be reduced significantly. In this work, we explore the usage of automatic paraphrasing models such as GPT-2 and CVAE to augment template phrases for task-oriented dialogue systems while preserving the slots. Additionally, we systematically analyze how far manually annotated training data can be reduced. We extrinsically evaluate the performance of a natural language understanding system on augmented data on various levels of data availability, reducing manually written templates by up to 75% while preserving the same level of accuracy. We further point out that the typical NLG quality metrics such as BLEU or utterance similarity are not suitable to assess the intrinsic quality of NLU paraphrases, and that public task-oriented NLU datasets such as ATIS and SNIPS have severe limitations.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2022 |
Herausgeber: | Sojka, Petr ; Horak, Ales ; Kopecek, Ivan ; Pala, Karel |
Autor(en): | Vogel, Liane ; Flek, Lucie |
Art des Eintrags: | Bibliographie |
Titel: | Investigating Paraphrasing-Based Data Augmentation for Task-Oriented Dialogue Systems |
Sprache: | Englisch |
Publikationsjahr: | 16 September 2022 |
Verlag: | Springer |
Buchtitel: | Text, Speech, and Dialogue |
Reihe: | Lecture Notes in Computer Science |
Band einer Reihe: | 13502 |
Veranstaltungstitel: | 25th International Conference on Text, Speech, and Dialogue |
Veranstaltungsort: | Brno, Czech Republic |
Veranstaltungsdatum: | 06.09.2022-09.09.2022 |
DOI: | 10.1007/978-3-031-16270-1_39 |
Kurzbeschreibung (Abstract): | With synthetic data generation, the required amount of human-generated training data can be reduced significantly. In this work, we explore the usage of automatic paraphrasing models such as GPT-2 and CVAE to augment template phrases for task-oriented dialogue systems while preserving the slots. Additionally, we systematically analyze how far manually annotated training data can be reduced. We extrinsically evaluate the performance of a natural language understanding system on augmented data on various levels of data availability, reducing manually written templates by up to 75% while preserving the same level of accuracy. We further point out that the typical NLG quality metrics such as BLEU or utterance similarity are not suitable to assess the intrinsic quality of NLU paraphrases, and that public task-oriented NLU datasets such as ATIS and SNIPS have severe limitations. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Data and AI Systems |
Hinterlegungsdatum: | 08 Feb 2023 09:06 |
Letzte Änderung: | 11 Mai 2023 15:17 |
PPN: | 507740270 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |