TU Darmstadt / ULB / TUbiblio

Authorship Verification via k-Nearest Neighbor Estimation

Halvani, Oren and Steinebach, Martin and Zimmermann, Ralf
Forner, Pamela and Navigli, Roberto and Tufis, Dan and Ferro, Nicola (eds.) :

Authorship Verification via k-Nearest Neighbor Estimation.
In: CEUR - Workshop Proceedings (1179). CEUR-WS.org
[Conference or Workshop Item] , (2013)

Abstract

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Item Type: Conference or Workshop Item
Erschienen: 2013
Editors: Forner, Pamela and Navigli, Roberto and Tufis, Dan and Ferro, Nicola
Creators: Halvani, Oren and Steinebach, Martin and Zimmermann, Ralf
Title: Authorship Verification via k-Nearest Neighbor Estimation
Language: ["languages_typename_1" not defined]
Abstract:

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Title of Book: Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013.
Series Name: CEUR - Workshop Proceedings
Number: 1179
Publisher: CEUR-WS.org
Uncontrolled Keywords: Secure Data;Authorship Verification, One-class classification
Divisions: LOEWE > LOEWE-Zentren > CASED – Center for Advanced Security Research Darmstadt
LOEWE > LOEWE-Zentren
LOEWE
Date Deposited: 30 Dec 2016 20:23
Identification Number: TUD-CS-2013-0162
Export:

Optionen (nur für Redakteure)

View Item View Item