TU Darmstadt / ULB / TUbiblio

Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

Jamison, Emily and Gurevych, Iryna
Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan (eds.) (2013):
Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads.
In: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013), INCOMA Ltd., Hissar, Bulgaria, [Online-Edition: http://www.aclweb.org/anthology/R13-1042],
[Conference or Workshop Item]

Abstract

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus.

Item Type: Conference or Workshop Item
Erschienen: 2013
Editors: Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan
Creators: Jamison, Emily and Gurevych, Iryna
Title: Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads
Language: English
Abstract:

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus.

Title of Book: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013)
Publisher: INCOMA Ltd.
Uncontrolled Keywords: UKP_p_ItForensics;reviewed;UKP_a_TexMinAn
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Hissar, Bulgaria
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/R13-1042
Identification Number: TUD-CS-2013-0208
Export:
Suche nach Titel in: TUfind oder in Google

Optionen (nur für Redakteure)

View Item View Item