TU Darmstadt / ULB / TUbiblio

A Generic Authorship Verification Scheme Based on Equal Error Rates (Notebook for PAN at CLEF 2015)

Halvani, Oren ; Winter, Christian
Hrsg.: Cappellato, Linda ; Ferro, Nicola ; Jones, Gareth ; Juan, Eric San (2015)
A Generic Authorship Verification Scheme Based on Equal Error Rates (Notebook for PAN at CLEF 2015).
Toulouse, France
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

We present a generic authorship verification scheme for the PAN-2015 identification task. Our scheme uses a two-step training phase on the training corpora. The first phase learns individual feature category parameters as well as decision thresholds based on equal error rates. The second phase builds feature category ensembles which are used for majority vote decisions because ensembles can outperform single feature categories. All feature categories used in our method are very simple to gain multiple advantages: Our method is entirely independent of any external linguistic resources (even word lists), and hence it can easily be applied to many languages. Moreover, the classification is very fast due to simple features. Additionally, we make use of parallelization. The evaluation of our scheme on a 40% split (which we did not use for training) of the official PAN-2015 training corpus led to an average corpus accuracy of 68.12%; in detail 60% for the Dutch, 67.5% for the English, 60% for the Greek and 85% for the Spanish subcorpus. The overall computation runtime was approximately 27 seconds.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2015
Herausgeber: Cappellato, Linda ; Ferro, Nicola ; Jones, Gareth ; Juan, Eric San
Autor(en): Halvani, Oren ; Winter, Christian
Art des Eintrags: Bibliographie
Titel: A Generic Authorship Verification Scheme Based on Equal Error Rates (Notebook for PAN at CLEF 2015)
Sprache: Englisch
Publikationsjahr: August 2015
Verlag: Sun SITE Central Europe
Buchtitel: CLEF2015 Working Notes, Working Notes of CLEF 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015
Reihe: CEUR Workshop Proceedings
Band einer Reihe: Vol-1391
Veranstaltungsort: Toulouse, France
Kurzbeschreibung (Abstract):

We present a generic authorship verification scheme for the PAN-2015 identification task. Our scheme uses a two-step training phase on the training corpora. The first phase learns individual feature category parameters as well as decision thresholds based on equal error rates. The second phase builds feature category ensembles which are used for majority vote decisions because ensembles can outperform single feature categories. All feature categories used in our method are very simple to gain multiple advantages: Our method is entirely independent of any external linguistic resources (even word lists), and hence it can easily be applied to many languages. Moreover, the classification is very fast due to simple features. Additionally, we make use of parallelization. The evaluation of our scheme on a 40% split (which we did not use for training) of the official PAN-2015 training corpus led to an average corpus accuracy of 68.12%; in detail 60% for the Dutch, 67.5% for the English, 60% for the Greek and 85% for the Spanish subcorpus. The overall computation runtime was approximately 27 seconds.

Freie Schlagworte: Secure Data;authorship verification, equal error rates, intrinsic
ID-Nummer: TUD-CS-2015-12073
Fachbereich(e)/-gebiet(e): LOEWE > LOEWE-Zentren > CASED – Center for Advanced Security Research Darmstadt
LOEWE > LOEWE-Zentren
LOEWE
Hinterlegungsdatum: 30 Dez 2016 20:23
Letzte Änderung: 17 Mai 2018 13:02
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen