TU Darmstadt / ULB / TUbiblio

Hierarchy Identification for Automatically Generating Table-of-Contents

Erbs, Nicolai and Gurevych, Iryna and Zesch, Torsten Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan (eds.) (2013):
Hierarchy Identification for Automatically Generating Table-of-Contents.
In: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013), INCOMA Ltd., Hissar, Bulgaria, pp. 252-260, [Online-Edition: http://www.aclweb.org/anthology/R13-1033],
[Conference or Workshop Item]

Abstract

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

Item Type: Conference or Workshop Item
Erschienen: 2013
Editors: Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan
Creators: Erbs, Nicolai and Gurevych, Iryna and Zesch, Torsten
Title: Hierarchy Identification for Automatically Generating Table-of-Contents
Language: English
Abstract:

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

Title of Book: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013)
Publisher: INCOMA Ltd.
Uncontrolled Keywords: Knowledge Discovery in Scientific Literature;UKP_a_NLP4Wikis;UKP_p_WIWEB;UKP_p_WIKULU;reviewed;UKP_s_JWPL;UKP_s_DKPro_Lab;UKP_s_DKPro_Core;UKP_p_openwindow
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Ubiquitous Knowledge Processing
Event Location: Hissar, Bulgaria
Date Deposited: 31 Dec 2016 14:29
Official URL: http://www.aclweb.org/anthology/R13-1033
Identification Number: TUD-CS-2013-0198
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)

View Item View Item