TU Darmstadt / ULB / TUbiblio

Towards Logical Hypertext Structure. A Graph-Theoretic Perspective

Mehler, Alexander ; Dehmer, Matthias ; Gleim, Rüdiger (2004)
Towards Logical Hypertext Structure. A Graph-Theoretic Perspective.
Conference or Workshop Item, Bibliographie

Abstract

Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graphtheory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.

Item Type: Conference or Workshop Item
Erschienen: 2004
Creators: Mehler, Alexander ; Dehmer, Matthias ; Gleim, Rüdiger
Type of entry: Bibliographie
Title: Towards Logical Hypertext Structure. A Graph-Theoretic Perspective
Language: German
Date: 2004
Publisher: Springer
Book Title: Proceedings of I2CS 04 - Innovative Internet Community Systems
Abstract:

Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graphtheory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.

Identification Number: dehmertowards04
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Telecooperation
Date Deposited: 31 Dec 2016 12:59
Last Modified: 03 Jun 2018 21:29
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details