TU Darmstadt / ULB / TUbiblio

Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections

Hättasch, Benjamin ; Bodensohn, Jan-Micha ; Binnig, Carsten (2022)
Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections.
2022 International Conference on Management of Data. Philadelphia, USA (12.-17.06.2022)
doi: 10.1145/3514221.3520174
Conference or Workshop Item, Bibliographie

Abstract

In this demo, we present ASET, a novel tool to explore the contents of unstructured data (text) by automatically transforming relevant parts into tabular form. ASET works in an ad-hoc manner without the need to curate extraction pipelines for the (unseen) text collection or to annotate large amounts of training data. The main idea is to use a new two-phased approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers. In a second step, it leverages embeddings and a novel matching strategy to match the extractions to a structured table definition as requested by the user. This demo features the ASET system with a graphical user interface that allows people without machine learning or programming expertise to explore text collections efficiently. This can be done in a self-directed and flexible manner, and ASET provides an intuitive impression of the result quality.

Item Type: Conference or Workshop Item
Erschienen: 2022
Creators: Hättasch, Benjamin ; Bodensohn, Jan-Micha ; Binnig, Carsten
Type of entry: Bibliographie
Title: Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections
Language: English
Date: July 2022
Publisher: ACM
Book Title: SIGMOD'22: Proceedings of the 2022 International Conference on Management of Data
Event Title: 2022 International Conference on Management of Data
Event Location: Philadelphia, USA
Event Dates: 12.-17.06.2022
DOI: 10.1145/3514221.3520174
Abstract:

In this demo, we present ASET, a novel tool to explore the contents of unstructured data (text) by automatically transforming relevant parts into tabular form. ASET works in an ad-hoc manner without the need to curate extraction pipelines for the (unseen) text collection or to annotate large amounts of training data. The main idea is to use a new two-phased approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers. In a second step, it leverages embeddings and a novel matching strategy to match the extractions to a structured table definition as requested by the user. This demo features the ASET system with a graphical user interface that allows people without machine learning or programming expertise to explore text collections efficiently. This can be done in a self-directed and flexible manner, and ASET provides an intuitive impression of the result quality.

Uncontrolled Keywords: systems_aset, systems_wannadb, systems_intexplore, text to table, interactive text exploration, matching embeddings
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Data and AI Systems
Date Deposited: 06 Jun 2023 12:37
Last Modified: 02 Aug 2023 13:36
PPN: 510088619
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details