TU Darmstadt / ULB / TUbiblio

Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond

Dycke, Nils ; Kuznetsov, Ilia ; Gurevych, Iryna (2022)
Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond.
2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, UAE (07.-11.12.2022)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The shift towards publicly available text sources has enabled language processing at unprecedented scale, yet leaves under-serviced the domains where public and openly licensed data is scarce. Proactively collecting text data for research is a viable strategy to address this scarcity, but lacks systematic methodology taking into account the many ethical, legal and confidentiality-related aspects of data collection. Our work presents a case study on proactive data collection in peer review – a challenging and under-resourced NLP domain. We outline ethical and legal desiderata for proactive data collection and introduce “Yes-Yes-Yes”, the first donation-based peer reviewing data collection workflow that meets these requirements. We report on the implementation of Yes-Yes-Yes at ACL Rolling Review and empirically study the implications of proactive data collection for the dataset size and the biases induced by the donation behavior on the peer reviewing platform.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2022
Autor(en): Dycke, Nils ; Kuznetsov, Ilia ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond
Sprache: Englisch
Publikationsjahr: Dezember 2022
Verlag: ACL
Buchtitel: Findings of the Association for Computational Linguistics: EMNLP 2022
Veranstaltungstitel: 2022 Conference on Empirical Methods in Natural Language Processing
Veranstaltungsort: Abu Dhabi, UAE
Veranstaltungsdatum: 07.-11.12.2022
URL / URN: https://aclanthology.org/2022.findings-emnlp.23
Kurzbeschreibung (Abstract):

The shift towards publicly available text sources has enabled language processing at unprecedented scale, yet leaves under-serviced the domains where public and openly licensed data is scarce. Proactively collecting text data for research is a viable strategy to address this scarcity, but lacks systematic methodology taking into account the many ethical, legal and confidentiality-related aspects of data collection. Our work presents a case study on proactive data collection in peer review – a challenging and under-resourced NLP domain. We outline ethical and legal desiderata for proactive data collection and introduce “Yes-Yes-Yes”, the first donation-based peer reviewing data collection workflow that meets these requirements. We report on the implementation of Yes-Yes-Yes at ACL Rolling Review and empirically study the implications of proactive data collection for the dataset size and the biases induced by the donation behavior on the peer reviewing platform.

Freie Schlagworte: UKP_p_InterText, UKP_p_seditrah_QABioLit
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 27 Feb 2023 15:19
Letzte Änderung: 18 Jul 2023 18:31
PPN: 509499856
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen