TU Darmstadt / ULB / TUbiblio

HDT: Hierarchical Document Transformer

He, Haoyu ; Flicke, Markus ; Buchmann, Jan ; Gurevych, Iryna ; Geiger, Andreas (2024)
HDT: Hierarchical Document Transformer.
1st Conference on Language Modeling. Philadelphia, USA (07.10.2024 - 09.10.2024)
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper, we propose the Hierarchical Document Transformer (HDT), a novel sparse Transformer architecture tailored for structured hierarchical documents. Such documents are extremely important in numerous domains, including science, law or medicine. However, most existing solutions are inefficient and fail to make use of the structure inherent to documents. HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. This approach facilitates information exchange between tokens at different levels while maintaining sparsity, thereby enhancing computational and memory efficiency while exploiting the document structure as an inductive bias. We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that considers the hierarchical structure of documents. As demonstrated by our experiments, utilizing structural information present in documents leads to faster convergence, higher sample efficiency and better performance on downstream tasks.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): He, Haoyu ; Flicke, Markus ; Buchmann, Jan ; Gurevych, Iryna ; Geiger, Andreas
Art des Eintrags: Bibliographie
Titel: HDT: Hierarchical Document Transformer
Sprache: Englisch
Publikationsjahr: 10 Juli 2024
Veranstaltungstitel: 1st Conference on Language Modeling
Veranstaltungsort: Philadelphia, USA
Veranstaltungsdatum: 07.10.2024 - 09.10.2024
URL / URN: https://openreview.net/forum?id=dkpeWQRmlc#discussion
Kurzbeschreibung (Abstract):

In this paper, we propose the Hierarchical Document Transformer (HDT), a novel sparse Transformer architecture tailored for structured hierarchical documents. Such documents are extremely important in numerous domains, including science, law or medicine. However, most existing solutions are inefficient and fail to make use of the structure inherent to documents. HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. This approach facilitates information exchange between tokens at different levels while maintaining sparsity, thereby enhancing computational and memory efficiency while exploiting the document structure as an inductive bias. We address the technical challenge of implementing HDT's sample-dependent hierarchical attention pattern by developing a novel sparse attention kernel that considers the hierarchical structure of documents. As demonstrated by our experiments, utilizing structural information present in documents leads to faster convergence, higher sample efficiency and better performance on downstream tasks.

Freie Schlagworte: UKP_p_InterText, UKP_p_LOEWE_Spitzenprofessur
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Ubiquitäre Wissensverarbeitung
Hinterlegungsdatum: 25 Okt 2024 14:19
Letzte Änderung: 25 Okt 2024 14:19
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen