Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna (2017)
Learning to Score System Summaries for Better Content Selection Evaluation.
EMNLP workshop "New Frontiers in Summarization". Copenhagen, Denmark (07.09.2017-07.09.2017)
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2017 |
Autor(en): | Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna |
Art des Eintrags: | Bibliographie |
Titel: | Learning to Score System Summaries for Better Content Selection Evaluation |
Sprache: | Englisch |
Publikationsjahr: | September 2017 |
Ort: | Copenhagen, Denmark |
Verlag: | Association for Computational Linguistics |
Buchtitel: | Proceedings of the EMNLP workshop "New Frontiers in Summarization" |
Veranstaltungstitel: | EMNLP workshop "New Frontiers in Summarization" |
Veranstaltungsort: | Copenhagen, Denmark |
Veranstaltungsdatum: | 07.09.2017-07.09.2017 |
URL / URN: | http://www.aclweb.org/anthology/W17-4510 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool. |
Freie Schlagworte: | Natural Language Processing;AIPHES_corpus;AIPHES_area_c3;AIPHES_area_b2 |
ID-Nummer: | TUD-CS-2017-0202 |
Fachbereich(e)/-gebiet(e): | DFG-Graduiertenkollegs DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen |
Hinterlegungsdatum: | 04 Jul 2017 10:32 |
Letzte Änderung: | 02 Jul 2024 10:13 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |