Daxenberger, Johannes ; Gurevych, Iryna (2012)
A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles.
Mumbai, India
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’s (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2012 |
Autor(en): | Daxenberger, Johannes ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles |
Sprache: | Englisch |
Publikationsjahr: | Dezember 2012 |
Buchtitel: | Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012) |
Veranstaltungsort: | Mumbai, India |
URL / URN: | http://www.aclweb.org/anthology/C12-1044 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’s (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available. |
Freie Schlagworte: | UKP_p_TextAsProcess;reviewed;UKP_a_TexMinAn |
ID-Nummer: | TUD-CS-2012-0225 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 31 Dez 2016 14:29 |
Letzte Änderung: | 24 Jan 2020 12:03 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |