TU Darmstadt / ULB / TUbiblio

Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections

Hättasch, Benjamin ; Bodensohn, Jan-Micha ; Binnig, Carsten (2022)
Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections.
2022 International Conference on Management of Data. Philadelphia, USA (12.-17.06.2022)
doi: 10.1145/3514221.3520174
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this demo, we present ASET, a novel tool to explore the contents of unstructured data (text) by automatically transforming relevant parts into tabular form. ASET works in an ad-hoc manner without the need to curate extraction pipelines for the (unseen) text collection or to annotate large amounts of training data. The main idea is to use a new two-phased approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers. In a second step, it leverages embeddings and a novel matching strategy to match the extractions to a structured table definition as requested by the user. This demo features the ASET system with a graphical user interface that allows people without machine learning or programming expertise to explore text collections efficiently. This can be done in a self-directed and flexible manner, and ASET provides an intuitive impression of the result quality.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2022
Autor(en): Hättasch, Benjamin ; Bodensohn, Jan-Micha ; Binnig, Carsten
Art des Eintrags: Bibliographie
Titel: Demonstrating ASET: Ad-Hoc Structured Exploration of Text Collections
Sprache: Englisch
Publikationsjahr: Juli 2022
Verlag: ACM
Buchtitel: SIGMOD'22: Proceedings of the 2022 International Conference on Management of Data
Veranstaltungstitel: 2022 International Conference on Management of Data
Veranstaltungsort: Philadelphia, USA
Veranstaltungsdatum: 12.-17.06.2022
DOI: 10.1145/3514221.3520174
Kurzbeschreibung (Abstract):

In this demo, we present ASET, a novel tool to explore the contents of unstructured data (text) by automatically transforming relevant parts into tabular form. ASET works in an ad-hoc manner without the need to curate extraction pipelines for the (unseen) text collection or to annotate large amounts of training data. The main idea is to use a new two-phased approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers. In a second step, it leverages embeddings and a novel matching strategy to match the extractions to a structured table definition as requested by the user. This demo features the ASET system with a graphical user interface that allows people without machine learning or programming expertise to explore text collections efficiently. This can be done in a self-directed and flexible manner, and ASET provides an intuitive impression of the result quality.

Freie Schlagworte: systems_aset, systems_wannadb, systems_intexplore, text to table, interactive text exploration, matching embeddings
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Data and AI Systems
Hinterlegungsdatum: 06 Jun 2023 12:37
Letzte Änderung: 02 Aug 2023 13:36
PPN: 510088619
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen