TU Darmstadt / ULB / TUbiblio

Histograms as a side effect of data movement for big data

István, Zsolt ; Woods, Louis ; Alonso, Gustavo (2014)
Histograms as a side effect of data movement for big data.
2014 International Conference on Management of Data. Snowbird, USA (22.06.2014-27.06.2014)
doi: 10.1145/2588555.2612174
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Histograms are a crucial part of database query planning but their computation is resource-intensive. As a consequence, generating histograms on database tables is typically performed as a batch job, separately from query processing. In this paper, we show how to calculate statistics as a side effect of data movement within a DBMS using a hardware accelerator in the data path. This accelerator analyzes tables as they are transmitted from storage to the processing unit, and provides histograms on the data retrieved for queries at virtually no extra performance cost. To evaluate our approach, we implemented this accelerator on an FPGA. This prototype calculates histograms faster and with similar or better accuracy than commercial databases. Moreover, the FPGA can provide various types of histograms such as Equi-depth, Compressed, or Max-diff on the same input data in parallel, without additional overhead.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2014
Autor(en): István, Zsolt ; Woods, Louis ; Alonso, Gustavo
Art des Eintrags: Bibliographie
Titel: Histograms as a side effect of data movement for big data
Sprache: Englisch
Publikationsjahr: 18 Juni 2014
Ort: New York, NY
Verlag: ACM
Buchtitel: SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Veranstaltungstitel: 2014 International Conference on Management of Data
Veranstaltungsort: Snowbird, USA
Veranstaltungsdatum: 22.06.2014-27.06.2014
DOI: 10.1145/2588555.2612174
Kurzbeschreibung (Abstract):

Histograms are a crucial part of database query planning but their computation is resource-intensive. As a consequence, generating histograms on database tables is typically performed as a batch job, separately from query processing. In this paper, we show how to calculate statistics as a side effect of data movement within a DBMS using a hardware accelerator in the data path. This accelerator analyzes tables as they are transmitted from storage to the processing unit, and provides histograms on the data retrieved for queries at virtually no extra performance cost. To evaluate our approach, we implemented this accelerator on an FPGA. This prototype calculates histograms faster and with similar or better accuracy than commercial databases. Moreover, the FPGA can provide various types of histograms such as Equi-depth, Compressed, or Max-diff on the same input data in parallel, without additional overhead.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Distributed and Networked Systems
Hinterlegungsdatum: 23 Jan 2023 12:34
Letzte Änderung: 11 Mai 2023 08:40
PPN: 50772383X
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen