TU Darmstadt / ULB / TUbiblio

The Influence of Input Data Complexity on Crowdsourcing Quality

Tauchmann, Christopher ; Daxenberger, Johannes ; Mieskes, Margot (2020)
The Influence of Input Data Complexity on Crowdsourcing Quality.
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion. Cagliari, Italy (March 17–20, 2020)
doi: 10.1145/3379336.3381499
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2020
Autor(en): Tauchmann, Christopher ; Daxenberger, Johannes ; Mieskes, Margot
Art des Eintrags: Bibliographie
Titel: The Influence of Input Data Complexity on Crowdsourcing Quality
Sprache: Englisch
Publikationsjahr: 2020
Veranstaltungstitel: IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion
Veranstaltungsort: Cagliari, Italy
Veranstaltungsdatum: March 17–20, 2020
DOI: 10.1145/3379336.3381499
URL / URN: https://dl.acm.org/doi/10.1145/3379336.3381499
Zugehörige Links:
Kurzbeschreibung (Abstract):

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

Freie Schlagworte: Task distribution, Natural Language Processing, Crowdsourcing
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen
Hinterlegungsdatum: 21 Apr 2020 14:49
Letzte Änderung: 28 Apr 2020 07:40
PPN:
Zugehörige Links:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen