TU Darmstadt / ULB / TUbiblio

Authorship Verification via k-Nearest Neighbor Estimation

Halvani, Oren ; Steinebach, Martin ; Zimmermann, Ralf
Forner, Pamela ; Navigli, Roberto ; Tufis, Dan ; Ferro, Nicola (eds.) :

Authorship Verification via k-Nearest Neighbor Estimation.
In: CEUR - Workshop Proceedings (1179). CEUR-WS.org
[ Konferenzveröffentlichung] , (2013)

Kurzbeschreibung (Abstract)

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2013
Herausgeber: Forner, Pamela ; Navigli, Roberto ; Tufis, Dan ; Ferro, Nicola
Autor(en): Halvani, Oren ; Steinebach, Martin ; Zimmermann, Ralf
Titel: Authorship Verification via k-Nearest Neighbor Estimation
Sprache: ["languages_typename_1" not defined]
Kurzbeschreibung (Abstract):

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Buchtitel: Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013.
Reihe: CEUR - Workshop Proceedings
(Heft-)Nummer: 1179
Verlag: CEUR-WS.org
Freie Schlagworte: Secure Data;Authorship Verification, One-class classification
Fachbereich(e)/-gebiet(e): LOEWE > LOEWE-Zentren > CASED – Center for Advanced Security Research Darmstadt
LOEWE > LOEWE-Zentren
LOEWE
Hinterlegungsdatum: 30 Dez 2016 20:23
ID-Nummer: TUD-CS-2013-0162
Export:

Optionen (nur für Redakteure)

Eintrag anzeigen Eintrag anzeigen