TU Darmstadt / ULB / TUbiblio

WannaDB: Ad-hoc SQL Queries over Text Collections

Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten (2023)
WannaDB: Ad-hoc SQL Queries over Text Collections.
20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS). Dresden, Germany (06.-10.03.2023)
doi: 10.18420/BTW2023-08
Conference or Workshop Item, Bibliographie

Abstract

In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.

Item Type: Conference or Workshop Item
Erschienen: 2023
Creators: Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten
Type of entry: Bibliographie
Title: WannaDB: Ad-hoc SQL Queries over Text Collections
Language: English
Date: 10 March 2023
Publisher: Gesellschaft für Informatik e.V.
Book Title: Datenbanksysteme für Business, Technologie und Web (BTW 2023)
Series: Lecture Notes in Informatics
Series Volume: P-331
Event Title: 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS)
Event Location: Dresden, Germany
Event Dates: 06.-10.03.2023
DOI: 10.18420/BTW2023-08
Abstract:

In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.

Uncontrolled Keywords: systems_wannadb, systems_softwarecampus, systems_intexplore
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Data and AI Systems
Date Deposited: 24 Jul 2023 13:03
Last Modified: 25 Jul 2023 16:08
PPN: 509917925
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details