Jamison, Emily ; Gurevych, Iryna
Hrsg.: Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan (2013)
Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads.
Hissar, Bulgaria
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2013 |
Herausgeber: | Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan |
Autor(en): | Jamison, Emily ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads |
Sprache: | Englisch |
Publikationsjahr: | September 2013 |
Verlag: | INCOMA Ltd. |
Buchtitel: | Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013) |
Veranstaltungsort: | Hissar, Bulgaria |
URL / URN: | http://www.aclweb.org/anthology/R13-1042 |
Kurzbeschreibung (Abstract): | Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multi-email threads with emails from the Enron Email Corpus. |
Freie Schlagworte: | UKP_p_ItForensics;reviewed;UKP_a_TexMinAn |
ID-Nummer: | TUD-CS-2013-0208 |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung |
Hinterlegungsdatum: | 31 Dez 2016 14:29 |
Letzte Änderung: | 24 Jan 2020 12:03 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |