TU Darmstadt / ULB / TUbiblio

A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles

Daxenberger, Johannes and Gurevych, Iryna (2012):
A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles.
In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 711-726, [Online-Edition: http://www.aclweb.org/anthology/C12-1044],
[Conference or Workshop Item]

Abstract

In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’s (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available.

Item Type: Conference or Workshop Item
Erschienen: 2012
Creators: Daxenberger, Johannes and Gurevych, Iryna
Title: A Corpus-Based Study of Edit Categories in Featured and Non-Featured Wikipedia Articles
Language: English
Abstract:

In this paper, we present a study of the collaborative writing process in Wikipedia. Our work is based on a corpus of 1,995 edits obtained from 891 article revisions in the English Wikipedia. We propose a 21-category classification scheme for edits based on Faigley and Witte’s (1981) model. Example edit categories include spelling error corrections and vandalism. In a manual multi-label annotation study with 3 annotators, we obtain an inter-annotator agreement of α = 0.67. We further analyze the distribution of edit categories for distinct stages in the revision history of 10 featured and 10 non-featured articles. Our results show that the information content in featured articles tends to become more stable after their promotion. On the opposite, this is not true for non-featured articles. We make the resulting corpus and the annotation guidelines freely available.

Title of Book: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012)
Uncontrolled Keywords: UKP_p_TextAsProcess;reviewed;UKP_a_TexMinAn
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Mumbai, India
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/C12-1044
Identification Number: TUD-CS-2012-0225
Related URLs:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)

View Item View Item