Mehler, Alexander ; Dehmer, Matthias ; Gleim, Rüdiger (2004)
Towards Logical Hypertext Structure. A Graph-Theoretic Perspective.
Conference or Workshop Item, Bibliographie
Abstract
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graphtheory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.
Item Type: | Conference or Workshop Item |
---|---|
Erschienen: | 2004 |
Creators: | Mehler, Alexander ; Dehmer, Matthias ; Gleim, Rüdiger |
Type of entry: | Bibliographie |
Title: | Towards Logical Hypertext Structure. A Graph-Theoretic Perspective |
Language: | German |
Date: | 2004 |
Publisher: | Springer |
Book Title: | Proceedings of I2CS 04 - Innovative Internet Community Systems |
Abstract: | Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graphtheory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents. |
Divisions: | 20 Department of Computer Science 20 Department of Computer Science > Telecooperation |
Date Deposited: | 31 Dec 2016 12:59 |
Last Modified: | 03 Jun 2018 21:29 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Send an inquiry |
Options (only for editors)
Show editorial Details |