TU Darmstadt / ULB / TUbiblio

ReStore - Neural Data Completion for Relational Databases

Hilprecht, Benjamin ; Binnig, Carsten (2021)
ReStore - Neural Data Completion for Relational Databases.
SIGMOD/PODS '21: International Conference on Management of Data. virtual Conference (20.-25.06.2021)
doi: 10.1145/3448016.3457264
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Classical approaches for OLAP assume that the data of all tables is complete. However, in case of incomplete tables with missing tuples, classical approaches fail since the result of a SQL aggregate query might significantly differ from the results computed on the full dataset. Today, the only way to deal with missing data is to manually complete the dataset which causes not only high efforts but also requires good statistical skills to determine when a dataset is actually complete. In this paper, we propose an automated approach for relational data completion called ReStore using a new class of (neural) schema-structured completion models that are able to synthesize data which resembles the missing tuples. As we show in our evaluation, this efficiently helps to reduce the relative error of aggregate queries by up to 390 on real-world data compared to using the incomplete data directly for query answering.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2021
Autor(en): Hilprecht, Benjamin ; Binnig, Carsten
Art des Eintrags: Bibliographie
Titel: ReStore - Neural Data Completion for Relational Databases
Sprache: Englisch
Publikationsjahr: 9 Juni 2021
Verlag: ACM
Buchtitel: SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data
Veranstaltungstitel: SIGMOD/PODS '21: International Conference on Management of Data
Veranstaltungsort: virtual Conference
Veranstaltungsdatum: 20.-25.06.2021
DOI: 10.1145/3448016.3457264
Kurzbeschreibung (Abstract):

Classical approaches for OLAP assume that the data of all tables is complete. However, in case of incomplete tables with missing tuples, classical approaches fail since the result of a SQL aggregate query might significantly differ from the results computed on the full dataset. Today, the only way to deal with missing data is to manually complete the dataset which causes not only high efforts but also requires good statistical skills to determine when a dataset is actually complete. In this paper, we propose an automated approach for relational data completion called ReStore using a new class of (neural) schema-structured completion models that are able to synthesize data which resembles the missing tuples. As we show in our evaluation, this efficiently helps to reduce the relative error of aggregate queries by up to 390 on real-world data compared to using the incomplete data directly for query answering.

Freie Schlagworte: incomplete data, deep autoregressive models, relational data, data completion, data-driven learning
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Data Management (2022 umbenannt in Data and AI Systems)
Hinterlegungsdatum: 13 Jul 2021 08:36
Letzte Änderung: 13 Jul 2021 08:36
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen