TU Darmstadt / ULB / TUbiblio

Learning to Score System Summaries for Better Content Selection Evaluation

Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna (2017)
Learning to Score System Summaries for Better Content Selection Evaluation.
Copenhagen, Denmark (September 2017)
Conference or Workshop Item

Abstract

The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.

Item Type: Conference or Workshop Item
Erschienen: 2017
Creators: Peyrard, Maxime ; Botschen, Teresa ; Gurevych, Iryna
Type of entry: Bibliographie
Title: Learning to Score System Summaries for Better Content Selection Evaluation
Language: English
Date: September 2017
Publisher: Association for Computational Linguistics
Book Title: Proceedings of the EMNLP workshop "New Frontiers in Summarization"
Event Location: Copenhagen, Denmark
Event Dates: September 2017
URL / URN: http://www.aclweb.org/anthology/W17-4510
Corresponding Links:
Abstract:

The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.

Uncontrolled Keywords: Natural Language Processing;AIPHES_corpus;AIPHES_area_c3;AIPHES_area_b2
Identification Number: TUD-CS-2017-0202
Divisions: DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Date Deposited: 04 Jul 2017 10:32
Last Modified: 24 Jan 2020 12:03
PPN:
Corresponding Links:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details