TU Darmstadt / ULB / TUbiblio

Automatically Classifying Edit Categories in Wikipedia Revisions

Daxenberger, Johannes and Gurevych, Iryna
Yarowsky, David and Baldwin, Timothy and Korhonen, Anna and Livescu, Karen and Bethard, Steven (eds.) (2013):
Automatically Classifying Edit Categories in Wikipedia Revisions.
In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), Association for Computational Linguistics, Seattle, WA, USA, [Online-Edition: http://www.aclweb.org/anthology/D13-1055],
[Conference or Workshop Item]

Abstract

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.

Item Type: Conference or Workshop Item
Erschienen: 2013
Editors: Yarowsky, David and Baldwin, Timothy and Korhonen, Anna and Livescu, Karen and Bethard, Steven
Creators: Daxenberger, Johannes and Gurevych, Iryna
Title: Automatically Classifying Edit Categories in Wikipedia Revisions
Language: English
Abstract:

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.

Title of Book: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)
Publisher: Association for Computational Linguistics
Uncontrolled Keywords: UKP_p_TextAsProcess;reviewed;UKP_a_WALL;UKP_a_TexMinAn
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Seattle, WA, USA
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/D13-1055
Identification Number: TUD-CS-2013-0259
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item