TU Darmstadt / ULB / TUbiblio

Towards Clustering of Web-based Document Structures

Dehmer, Matthias and Emmert-Streib, Frank and Kilian, Jürgen and Zulauf, Andreas (2005):
Towards Clustering of Web-based Document Structures.
In: Proceedings of the International Conference on Enformatika, Systems Sciences and Engineering, Krakow/Poland, Enformatika 9, pp. 304--310, [Conference or Workshop Item]

Abstract

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Item Type: Conference or Workshop Item
Erschienen: 2005
Creators: Dehmer, Matthias and Emmert-Streib, Frank and Kilian, Jürgen and Zulauf, Andreas
Title: Towards Clustering of Web-based Document Structures
Language: German
Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Title of Book: Proceedings of the International Conference on Enformatika, Systems Sciences and Engineering, Krakow/Poland, Enformatika 9
Uncontrolled Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining
Divisions: 20 Department of Computer Science > Telecooperation
20 Department of Computer Science
Date Deposited: 31 Dec 2016 12:59
Identification Number: DEKZ:2005
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details