Tauchmann, Christopher ; Daxenberger, Johannes ; Mieskes, Margot (2020)
The Influence of Input Data Complexity on Crowdsourcing Quality.
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion. Cagliari, Italy (17.03.2020-20.03.2020)
doi: 10.1145/3379336.3381499
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2020 |
Autor(en): | Tauchmann, Christopher ; Daxenberger, Johannes ; Mieskes, Margot |
Art des Eintrags: | Bibliographie |
Titel: | The Influence of Input Data Complexity on Crowdsourcing Quality |
Sprache: | Englisch |
Publikationsjahr: | 2020 |
Veranstaltungstitel: | IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion |
Veranstaltungsort: | Cagliari, Italy |
Veranstaltungsdatum: | 17.03.2020-20.03.2020 |
DOI: | 10.1145/3379336.3381499 |
URL / URN: | https://dl.acm.org/doi/10.1145/3379336.3381499 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset. |
Freie Schlagworte: | Task distribution, Natural Language Processing, Crowdsourcing |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Künstliche Intelligenz und Maschinelles Lernen |
Hinterlegungsdatum: | 21 Apr 2020 14:49 |
Letzte Änderung: | 28 Apr 2020 07:40 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |