Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten (2023)
WannaDB: Ad-hoc SQL Queries over Text Collections.
20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS). Dresden, Germany (06.03.2023-10.03.2023)
doi: 10.18420/BTW2023-08
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2023 |
Autor(en): | Hättasch, Benjamin ; Vogel, Liane ; Bodensohn, Jan-Micha ; Urban, Matthias ; Binnig, Carsten |
Art des Eintrags: | Bibliographie |
Titel: | WannaDB: Ad-hoc SQL Queries over Text Collections |
Sprache: | Englisch |
Publikationsjahr: | 10 März 2023 |
Verlag: | Gesellschaft für Informatik e.V. |
Buchtitel: | Datenbanksysteme für Business, Technologie und Web (BTW 2023) |
Reihe: | Lecture Notes in Informatics |
Band einer Reihe: | P-331 |
Veranstaltungstitel: | 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme" (DBIS) |
Veranstaltungsort: | Dresden, Germany |
Veranstaltungsdatum: | 06.03.2023-10.03.2023 |
DOI: | 10.18420/BTW2023-08 |
Kurzbeschreibung (Abstract): | In this paper, we propose a new system called WannaDB that allows users to interactively perform structured explorations of text collections in an ad-hoc manner. Extracting structured data from text is a classical problem where a plenitude of approaches and even industry-scale systems already exists. However, these approaches lack in the ability to support the ad-hoc exploration of texts using structured queries. The main idea of WannaDB is to include user interaction to support ad-hoc SQL queries over text collections using a new two-phased approach. First, a superset of information nuggets from the texts is extracted using existing extractors such as named entity recognizers. Then, the extractions are interactively matched to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that WannaDB is thus able to extract structured data from a broad range of (real-world) text collections in high quality without the need to design extraction pipelines upfront. |
Freie Schlagworte: | systems_wannadb, systems_softwarecampus, systems_intexplore |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Data and AI Systems |
Hinterlegungsdatum: | 24 Jul 2023 13:03 |
Letzte Änderung: | 25 Jul 2023 16:08 |
PPN: | 509917925 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |