TU Darmstadt / ULB / TUbiblio

TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation

Stangier, Lorenz ; Lee, Ji-Ung ; Wang, Yuxi ; Müller, Marvin ; Frick, Nicholas ; Metternich, Joachim ; Gurevych, Iryna (2022)
TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation.
doi: 10.48550/arXiv.2208.07846
Report, Bibliographie

Kurzbeschreibung (Abstract)

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

Typ des Eintrags: Report
Erschienen: 2022
Autor(en): Stangier, Lorenz ; Lee, Ji-Ung ; Wang, Yuxi ; Müller, Marvin ; Frick, Nicholas ; Metternich, Joachim ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: TexPrax: A Messaging Application for Ethical, Real-time Data Collection and Annotation
Sprache: Englisch
Publikationsjahr: 20 Oktober 2022
Verlag: arXiv
Reihe: Computation and Language
Auflage: 1. Auflage
DOI: 10.48550/arXiv.2208.07846
URL / URN: https://arxiv.org/abs/2208.07846
Kurzbeschreibung (Abstract):

Collecting and annotating task-oriented dialog data is difficult, especially for highly specific domains that require expert knowledge. At the same time, informal communication channels such as instant messengers are increasingly being used at work. This has led to a lot of work-relevant information that is disseminated through those channels and needs to be post-processed manually by the employees. To alleviate this problem, we present TexPrax, a messaging system to collect and annotate problems, causes, and solutions that occur in work-related chats. TexPrax uses a chatbot to directly engage the employees to provide lightweight annotations on their conversation and ease their documentation work. To comply with data privacy and security regulations, we use an end-to-end message encryption and give our users full control over their data which has various advantages over conventional annotation tools. We evaluate TexPrax in a user-study with German factory employees who ask their colleagues for solutions on problems that arise during their daily work. Overall, we collect 202 task-oriented German dialogues containing 1,027 sentences with sentence-level expert annotations. Our data analysis also reveals that real-world conversations frequently contain instances with code-switching, varying abbreviations for the same entity, and dialects which NLP systems should be able to handle.

Zusätzliche Informationen:

Preprint

Fachbereich(e)/-gebiet(e): 16 Fachbereich Maschinenbau
16 Fachbereich Maschinenbau > Institut für Produktionsmanagement und Werkzeugmaschinen (PTW)
Zentrale Einrichtungen
Zentrale Einrichtungen > hessian.AI - Hessisches Zentrum für Künstliche Intelligenz
Hinterlegungsdatum: 13 Dez 2022 13:09
Letzte Änderung: 22 Jul 2024 12:06
PPN: 507190467
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen