TU Darmstadt / ULB / TUbiblio

Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

Jamison, Emily ; Gurevych, Iryna
Hrsg.: Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan (2013)
Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads.
Hissar, Bulgaria
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2013
Herausgeber: Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan
Autor(en): Jamison, Emily ; Gurevych, Iryna
Art des Eintrags: Bibliographie
Titel: Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads
Sprache: Englisch
Publikationsjahr: September 2013
Verlag: INCOMA Ltd.
Buchtitel: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013)
Veranstaltungsort: Hissar, Bulgaria
URL / URN: http://www.aclweb.org/anthology/R13-1042
Kurzbeschreibung (Abstract):

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus.

Freie Schlagworte: UKP_p_ItForensics;reviewed;UKP_a_TexMinAn
ID-Nummer: TUD-CS-2013-0208
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen