TU Darmstadt / ULB / TUbiblio

WannaDB: Ad-hoc SQL Queries over Text Collections

Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten (2023)
WannaDB: Ad-hoc SQL Queries over Text Collections.
20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS). Dresden, Germany (06.03.2023-10.03.2023)
doi: 10.18420/BTW2023-08
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten
Art des Eintrags: Bibliographie
Titel: WannaDB: Ad-hoc SQL Queries over Text Collections
Sprache: Englisch
Publikationsjahr: 10 März 2023
Verlag: Gesellschaft für Informatik e.V.
Buchtitel: Datenbanksysteme für Business, Technologie und Web (BTW 2023)
Reihe: Lecture Notes in Informatics
Band einer Reihe: P-331
Veranstaltungstitel: 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS)
Veranstaltungsort: Dresden, Germany
Veranstaltungsdatum: 06.03.2023-10.03.2023
DOI: 10.18420/BTW2023-08
Kurzbeschreibung (Abstract):

In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.

Freie Schlagworte: systems_wannadb, systems_softwarecampus, systems_intexplore
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Data and AI Systems
Hinterlegungsdatum: 24 Jul 2023 13:03
Letzte Änderung: 25 Jul 2023 16:08
PPN: 509917925
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen