TU Darmstadt / ULB / TUbiblio

Hierarchy Identification for Automatically Generating Table-of-Contents

Erbs, Nicolai ; Gurevych, Iryna ; Zesch, Torsten
Hrsg.: Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan (2013)
Hierarchy Identification for Automatically Generating Table-of-Contents.
Hissar, Bulgaria
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2013
Herausgeber: Angelova, Galia ; Bontcheva, Kalina ; Mitkov, Ruslan
Autor(en): Erbs, Nicolai ; Gurevych, Iryna ; Zesch, Torsten
Art des Eintrags: Bibliographie
Titel: Hierarchy Identification for Automatically Generating Table-of-Contents
Sprache: Englisch
Publikationsjahr: September 2013
Verlag: INCOMA Ltd.
Buchtitel: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013)
Veranstaltungsort: Hissar, Bulgaria
URL / URN: http://www.aclweb.org/anthology/R13-1033
Kurzbeschreibung (Abstract):

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend out work by auto matically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

Freie Schlagworte: Knowledge Discovery in Scientific Literature;UKP_a_NLP4Wikis;UKP_p_WIWEB;UKP_p_WIKULU;reviewed;UKP_s_JWPL;UKP_s_DKPro_Lab;UKP_s_DKPro_Core;UKP_p_openwindow
ID-Nummer: TUD-CS-2013-0198
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 31 Dez 2016 14:29
Letzte Änderung: 24 Jan 2020 12:03
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen