Hilprecht, Benjamin ; Binnig, Carsten (2021)
ReStore - Neural Data Completion for Relational Databases.
SIGMOD/PODS '21: International Conference on Management of Data. virtual Conference (20.06.2021-25.06.2021)
doi: 10.1145/3448016.3457264
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Classical approaches for OLAP assume that the data of all tables is complete. However, in case of incomplete tables with missing tuples, classical approaches fail since the result of a SQL aggregate query might significantly differ from the results computed on the full dataset. Today, the only way to deal with missing data is to manually complete the dataset which causes not only high efforts but also requires good statistical skills to determine when a dataset is actually complete. In this paper, we propose an automated approach for relational data completion called ReStore using a new class of (neural) schema-structured completion models that are able to synthesize data which resembles the missing tuples. As we show in our evaluation, this efficiently helps to reduce the relative error of aggregate queries by up to 390 on real-world data compared to using the incomplete data directly for query answering.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2021 |
Autor(en): | Hilprecht, Benjamin ; Binnig, Carsten |
Art des Eintrags: | Bibliographie |
Titel: | ReStore - Neural Data Completion for Relational Databases |
Sprache: | Englisch |
Publikationsjahr: | 9 Juni 2021 |
Verlag: | ACM |
Buchtitel: | SIGMOD/PODS '21: Proceedings of the 2021 International Conference on Management of Data |
Veranstaltungstitel: | SIGMOD/PODS '21: International Conference on Management of Data |
Veranstaltungsort: | virtual Conference |
Veranstaltungsdatum: | 20.06.2021-25.06.2021 |
DOI: | 10.1145/3448016.3457264 |
Kurzbeschreibung (Abstract): | Classical approaches for OLAP assume that the data of all tables is complete. However, in case of incomplete tables with missing tuples, classical approaches fail since the result of a SQL aggregate query might significantly differ from the results computed on the full dataset. Today, the only way to deal with missing data is to manually complete the dataset which causes not only high efforts but also requires good statistical skills to determine when a dataset is actually complete. In this paper, we propose an automated approach for relational data completion called ReStore using a new class of (neural) schema-structured completion models that are able to synthesize data which resembles the missing tuples. As we show in our evaluation, this efficiently helps to reduce the relative error of aggregate queries by up to 390 on real-world data compared to using the incomplete data directly for query answering. |
Freie Schlagworte: | incomplete data, deep autoregressive models, relational data, data completion, data-driven learning |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Data Management (2022 umbenannt in Data and AI Systems) |
Hinterlegungsdatum: | 13 Jul 2021 08:36 |
Letzte Änderung: | 13 Jul 2021 08:36 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |