TU Darmstadt / ULB / TUbiblio

DBPal: A Fully Pluggable NL2SQL Training Pipeline

Weir, Nathaniel and Utama, Prasetya and Galakatos, Alex and Crotty, Andrew and Ilkhechi, Amir and Ramaswamy, Shekar and Bhushan, Rohin and Geisler, Nadja and Hättasch, Benjamin and Eger, Steffen and Cetintemel, Ugur and Binnig, Carsten Maier, David and Pottinger, Rachel and Doan, AnHai and Tan, Wang-Chiew and Alawini, Abdussalam and Ngo, Hung Q. (eds.) (2020):
DBPal: A Fully Pluggable NL2SQL Training Pipeline.
In: SIGMOD'20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2347-2361,
ACM, SIGMOD/PODS '20: International Conference on Management of Data, virtual Conference, 14.-19.06.2020, ISBN 978-1-4503-6735-6,
DOI: 10.1145/3318464.3380589,
[Conference or Workshop Item]

Abstract

Natural language is a promising alternative interface to DBMSs because it enables non-technical users to formulate complex questions in a more concise manner than SQL. Recently, deep learning has gained traction for translating natural language to SQL, since similar ideas have been successful in the related domain of machine translation. However, the core problem with existing deep learning approaches is that they require an enormous amount of training data in order to provide accurate translations. This training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language examples with the corresponding SQL queries (or vice versa). Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to (1) improve overall translation accuracy, (2) increase robustness to linguistic variation, and (3) specialize the model for the target database. As we show, our DBPal training pipeline is able to improve both the accuracy and linguistic robustness of state-of-the-art natural language to SQL translation models.

Item Type: Conference or Workshop Item
Erschienen: 2020
Editors: Maier, David and Pottinger, Rachel and Doan, AnHai and Tan, Wang-Chiew and Alawini, Abdussalam and Ngo, Hung Q.
Creators: Weir, Nathaniel and Utama, Prasetya and Galakatos, Alex and Crotty, Andrew and Ilkhechi, Amir and Ramaswamy, Shekar and Bhushan, Rohin and Geisler, Nadja and Hättasch, Benjamin and Eger, Steffen and Cetintemel, Ugur and Binnig, Carsten
Title: DBPal: A Fully Pluggable NL2SQL Training Pipeline
Language: English
Abstract:

Natural language is a promising alternative interface to DBMSs because it enables non-technical users to formulate complex questions in a more concise manner than SQL. Recently, deep learning has gained traction for translating natural language to SQL, since similar ideas have been successful in the related domain of machine translation. However, the core problem with existing deep learning approaches is that they require an enormous amount of training data in order to provide accurate translations. This training data is extremely expensive to curate, since it generally requires humans to manually annotate natural language examples with the corresponding SQL queries (or vice versa). Based on these observations, we propose DBPal, a new approach that augments existing deep learning techniques in order to improve the performance of models for natural language to SQL translation. More specifically, we present a novel training pipeline that automatically generates synthetic training data in order to (1) improve overall translation accuracy, (2) increase robustness to linguistic variation, and (3) specialize the model for the target database. As we show, our DBPal training pipeline is able to improve both the accuracy and linguistic robustness of state-of-the-art natural language to SQL translation models.

Title of Book: SIGMOD'20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
ISBN: 978-1-4503-6735-6
Uncontrolled Keywords: dm, dm_nlidb, dm_dbpal
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Data Management
DFG-Graduiertenkollegs
DFG-Graduiertenkollegs > Research Training Group 1994 Adaptive Preparation of Information from Heterogeneous Sources
Event Title: SIGMOD/PODS '20: International Conference on Management of Data
Event Location: virtual Conference
Event Dates: 14.-19.06.2020
Date Deposited: 25 May 2021 08:05
DOI: 10.1145/3318464.3380589
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details