TU Darmstadt / ULB / TUbiblio

Automatically Classifying Edit Categories in Wikipedia Revisions

Daxenberger, Johannes ; Gurevych, Iryna
Hrsg.: Yarowsky, David ; Baldwin, Timothy ; Korhonen, Anna ; Livescu, Karen ; Bethard, Steven (2013)
Automatically Classifying Edit Categories in Wikipedia Revisions.
Seattle, WA, USA
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2013
Herausgeber: Yarowsky, David ; Baldwin, Timothy ; Korhonen, Anna ; Livescu, Karen ; Bethard, Steven
Autor(en): Daxenberger, Johannes ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Automatically Classifying Edit Categories in Wikipedia Revisions
Sprache: Englisch
Publikationsjahr: Oktober 2013
Verlag: Association for Computational Linguistics
Buchtitel: Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)
Veranstaltungsort: Seattle, WA, USA
URL / URN: http://www.aclweb.org/anthology/D13-1055
Kurzbeschreibung (Abstract):

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles.

Freie Schlagworte: UKP_p_TextAsProcess;reviewed;UKP_a_WALL;UKP_a_TexMinAn
ID-Nummer: TUD-CS-2013-0259
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen