TU Darmstadt / ULB / TUbiblio

DBPal: A Fully Pluggable NL2SQL Training Pipeline

Weir, Nathaniel ; Utama, Prasetya ; Galakatos, Alex ; Crotty, Andrew ; Ilkhechi, Amir ; Ramaswamy, Shekar ; Bhushan, Rohin ; Geisler, Nadja ; Hättasch, Benjamin ; Eger, Steffen ; Cetintemel, Ugur ; Binnig, Carsten
Hrsg.: Maier, David ; Pottinger, Rachel ; Doan, AnHai ; Tan, Wang-Chiew ; Alawini, Abdussalam ; Ngo, Hung Q. (2020)
DBPal: A Fully Pluggable NL2SQL Training Pipeline.
SIGMOD/PODS '20: International Conference on Management of Data. virtual Conference (14.06.2020-19.06.2020)
doi: 10.1145/3318464.3380589
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Natural language is a promising alternative interface to DBMSs because it enables non-technical users to formulate complex questions in a more concise manner than SQL. Recently, deep learning has gained traction for translating natural language to SQL, since similar ideas have been successful in the related domain of machine translation. However, the core problem with existing deep learning approaches is that they require an enormous amount of training data in order to provide accurate translations. This training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language examples with the corresponding SQL queries (or vice versa). Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to (1) improve overall translation accuracy, (2) increase robustness to linguistic variation, and (3) specialize the model for the target database. As we show, our DBPal training pipeline is able to improve both the accuracy and linguistic robustness of state-of-the-art natural language to SQL translation models.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2020
Herausgeber: Maier, David ; Pottinger, Rachel ; Doan, AnHai ; Tan, Wang-Chiew ; Alawini, Abdussalam ; Ngo, Hung Q.
Autor(en): Weir, Nathaniel ; Utama, Prasetya ; Galakatos, Alex ; Crotty, Andrew ; Ilkhechi, Amir ; Ramaswamy, Shekar ; Bhushan, Rohin ; Geisler, Nadja ; Hättasch, Benjamin ; Eger, Steffen ; Cetintemel, Ugur ; Binnig, Carsten
Art des Eintrags: Bibliographie
Titel: DBPal: A Fully Pluggable NL2SQL Training Pipeline
Sprache: Englisch
Publikationsjahr: Juni 2020
Verlag: ACM
Buchtitel: SIGMOD'20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
Veranstaltungstitel: SIGMOD/PODS '20: International Conference on Management of Data
Veranstaltungsort: virtual Conference
Veranstaltungsdatum: 14.06.2020-19.06.2020
DOI: 10.1145/3318464.3380589
Kurzbeschreibung (Abstract):

Natural language is a promising alternative interface to DBMSs because it enables non-technical users to formulate complex questions in a more concise manner than SQL. Recently, deep learning has gained traction for translating natural language to SQL, since similar ideas have been successful in the related domain of machine translation. However, the core problem with existing deep learning approaches is that they require an enormous amount of training data in order to provide accurate translations. This training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language examples with the corresponding SQL queries (or vice versa). Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to (1) improve overall translation accuracy, (2) increase robustness to linguistic variation, and (3) specialize the model for the target database. As we show, our DBPal training pipeline is able to improve both the accuracy and linguistic robustness of state-of-the-art natural language to SQL translation models.

Freie Schlagworte: dm, dm_nlidb, dm_dbpal
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Data Management (2022 umbenannt in Data and AI Systems)
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Graduiertenkolleg 1994 Adaptive Informationsaufbereitung aus heterogenen Quellen
Hinterlegungsdatum: 25 Mai 2021 08:05
Letzte Änderung: 25 Mai 2021 08:05
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen