TU Darmstadt / ULB / TUbiblio

The Influence of Input Data Complexity on Crowdsourcing Quality

Tauchmann, Christopher and Daxenberger, Johannes and Mieskes, Margot (2020):
The Influence of Input Data Complexity on Crowdsourcing Quality.
pp. 71-72, IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, Cagliari, Italy, March 17–20, 2020, ISBN 978-1-4503-7513-9,
DOI: 10.1145/3379336.3381499,
[Conference or Workshop Item]

Abstract

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

Item Type: Conference or Workshop Item
Erschienen: 2020
Creators: Tauchmann, Christopher and Daxenberger, Johannes and Mieskes, Margot
Title: The Influence of Input Data Complexity on Crowdsourcing Quality
Language: English
Abstract:

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

ISBN: 978-1-4503-7513-9
Uncontrolled Keywords: Task distribution, Natural Language Processing, Crowdsourcing
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Artificial Intelligence and Machine Learning
Event Title: IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion
Event Location: Cagliari, Italy
Event Dates: March 17–20, 2020
Date Deposited: 21 Apr 2020 14:49
DOI: 10.1145/3379336.3381499
Official URL: https://dl.acm.org/doi/10.1145/3379336.3381499
Corresponding Links:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details