TU Darmstadt / ULB / TUbiblio

Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections

Ruppert, Tobias ; Staab, Michael ; Bannach, Andreas ; Lücke-Tieke, Hendrik ; Bernard, Jürgen ; Kuijper, Arjan ; Kohlhammer, Jörn (2017)
Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections.
IS&T International Symposium on Electronic Imaging.
doi: 10.2352/ISSN.2470-1173.2017.1.VDA-388
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

The exploration of text document collections is a complex and cumbersome task. Clustering techniques can help to group documents based on their content for the generation of overviews. However, the underlying clustering workflows comprising preprocessing, feature selection, clustering algorithm selection and parameterization offer several degrees of freedom. Since no "best" clustering workflow exists, users have to evaluate clustering results based on the data and analysis tasks at hand. In our approach, we present an interactive system for the creation and validation of text clustering workflows with the goal to explore document collections. The system allows users to control every step of the text clustering workflow. First, users are supported in the feature selection process via feature selection metrics-based feature ranking and linguistic filtering (e.g., part-of-speech filtering). Second, users can choose between different clustering methods and their parameterizations. Third, the clustering results can be explored based on the cluster content (documents and relevant feature terms), and cluster quality measures. Fourth, the results of different clusterings can be compared, and frequent document subsets in clusters can be identified. We validate the usefulness of the system with a usage scenario describing how users can explore document collections in a visual and interactive way.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2017
Autor(en): Ruppert, Tobias ; Staab, Michael ; Bannach, Andreas ; Lücke-Tieke, Hendrik ; Bernard, Jürgen ; Kuijper, Arjan ; Kohlhammer, Jörn
Art des Eintrags: Bibliographie
Titel: Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections
Sprache: Englisch
Publikationsjahr: 2017
Ort: Springfield
Verlag: Society for Imaging Science and Technology
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Electronic Imaging
(Heft-)Nummer: 1
Buchtitel: Electronic Imaging, Visualization and Data Analysis
Veranstaltungstitel: IS&T International Symposium on Electronic Imaging
DOI: 10.2352/ISSN.2470-1173.2017.1.VDA-388
URL / URN: https://doi.org/10.2352/ISSN.2470-1173.2017.1.VDA-388
Kurzbeschreibung (Abstract):

The exploration of text document collections is a complex and cumbersome task. Clustering techniques can help to group documents based on their content for the generation of overviews. However, the underlying clustering workflows comprising preprocessing, feature selection, clustering algorithm selection and parameterization offer several degrees of freedom. Since no "best" clustering workflow exists, users have to evaluate clustering results based on the data and analysis tasks at hand. In our approach, we present an interactive system for the creation and validation of text clustering workflows with the goal to explore document collections. The system allows users to control every step of the text clustering workflow. First, users are supported in the feature selection process via feature selection metrics-based feature ranking and linguistic filtering (e.g., part-of-speech filtering). Second, users can choose between different clustering methods and their parameterizations. Third, the clustering results can be explored based on the cluster content (documents and relevant feature terms), and cluster quality measures. Fourth, the results of different clusterings can be compared, and frequent document subsets in clusters can be identified. We validate the usefulness of the system with a usage scenario describing how users can explore document collections in a visual and interactive way.

Freie Schlagworte: Visual analytics, Information visualization, Text mining, Text analysis, Clustering
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Graphisch-Interaktive Systeme
20 Fachbereich Informatik > Mathematisches und angewandtes Visual Computing
Hinterlegungsdatum: 05 Mai 2020 15:24
Letzte Änderung: 05 Mai 2020 15:24
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen