TU Darmstadt / ULB / TUbiblio

Authorship Verification via k-Nearest Neighbor Estimation

Halvani, Oren ; Steinebach, Martin ; Zimmermann, Ralf
Hrsg.: Forner, Pamela ; Navigli, Roberto ; Tufis, Dan ; Ferro, Nicola (2013)
Authorship Verification via k-Nearest Neighbor Estimation.
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2013
Herausgeber: Forner, Pamela ; Navigli, Roberto ; Tufis, Dan ; Ferro, Nicola
Autor(en): Halvani, Oren ; Steinebach, Martin ; Zimmermann, Ralf
Art des Eintrags: Bibliographie
Titel: Authorship Verification via k-Nearest Neighbor Estimation
Sprache: Englisch
Publikationsjahr: September 2013
Verlag: CEUR-WS.org
(Heft-)Nummer: 1179
Buchtitel: Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013.
Reihe: CEUR - Workshop Proceedings
Kurzbeschreibung (Abstract):

In this paper we describe our k-Nearest Neighbor (k-NN) based Authorship Verification method for the Author Identification (AI) task of the PAN 2013 challenge. The method follows an ensemble classification technique based on the combination of suitable feature categories. For each chosen feature category we apply a k-NN classifier to calculate a style deviation score between the training documents of the true author A and the document from an author, who claims to be A. Depending on the score and a given threshold, a decision for or against the alleged author is generated and stored into a list. Afterwards, the final decision regarding the alleged authorship is determined through a majority vote among all decisions within this list. The method provides a number of benefits as for instance the independence of linguistic resources like ontologies, thesauruses or even language models. A further benefit is the language-independency among different Indo-European languages as the approach is applicable on languages like Spanish, English, Greek or German. Another benefit is the low runtime of the method, since there is no need for deep linguistic processing like POS-tagging, chunking or parsing. Moreover, the method can be extended or modified for instance by replacing the classification function, the threshold or the underlying features including their parameters (e.g. n-Gram sizes or the amount of feature frequencies). In addition to the PAN 2013 AI-training-corpus, where we gained an overall accuracy score of 80%, we also evaluated the algorithm on our own dataset with an accuracy of 77.50%.

Freie Schlagworte: Secure Data;Authorship Verification, One-class classification
ID-Nummer: TUD-CS-2013-0162
Fachbereich(e)/-gebiet(e): LOEWE > LOEWE-Zentren > CASED – Center for Advanced Security Research Darmstadt
LOEWE > LOEWE-Zentren
LOEWE
Hinterlegungsdatum: 30 Dez 2016 20:23
Letzte Änderung: 17 Mai 2018 13:02
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen