Agnihotri, Pratyush (2024)
Accurate Performance Modeling for Distributed Stream Processing: Methods for Performance Benchmarking and Zero-shot Parallelism Tuning in Distributed and Heterogeneous Environments.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00028144
Dissertation, Erstveröffentlichung, Verlagsversion
Kurzbeschreibung (Abstract)
Distributed Stream Processing (DSP) systems have emerged as a pivotal paradigm, enabling real-time data analysis using distributed cloud resources. Major Internet companies like Amazon and Google, build on DSP systems for their real-time data workloads. For instance, Amazon provides Apache Flink as a service for implementing DSP workloads. Parallelism is often a desired property of DSP workloads to meet the timeliness and scaling requirements of current applications, necessitating the use of distributed and multi-core cloud resources. However, cloud resources are heterogeneous in nature, which makes understanding the performance of DSP workloads very difficult, as it depends on highly varying resources, i.e., compute, storage, and network. Therefore, (i) understanding the performance and (ii) predicting it for distinct DSP workloads on such heterogeneous cloud environments are both very challenging problems. This thesis solves these two fundamental research challenges by contributing methods for accurate performance modeling of DSP workloads in heterogeneous cloud environments.
First, this thesis contributes to methods for performance understanding by proposing PDSP-BENCH, a novel benchmarking system. It tackles three primary challenges of existing work: lack of expressiveness in benchmarking parallel DSP workloads, the need for heterogeneous hardware support, and the need for integration of learned DSP models. Unlike existing systems, PDSP-BENCH enables the evaluation of parallel DSP applications and workloads using both synthetic and real-world applications, offering an expressive and scalable solution. Further, it facilitates the systematic training and evaluation of learned DSP models on diverse streaming workloads, which is crucial for optimizing performance. The extensive evaluation of PDSP-BENCH demonstrates its benchmarking capabilities and highlights the impact of varying query complexities, hardware configurations, and workload parameters on system performance. The key observations of our experiments show the non-linearity and paradoxical effects of parallelism on performance.
Second, this thesis contributes to methods on performance prediction and optimization by proposing ZEROTUNE, a novel learned cost model for DSP workloads and an optimizer for parallelism tuning. It provides highly accurate cost predictions while generalizing to (unseen) heterogeneous hardware resources of the cloud. The generalizability of the model is based on transfer learning, the same technique that is used in Large Language Models like ChatGPT. The main idea is to learn from so-called transferable features and parallel graph representation that together enable the model to generalize to unseen DSP workloads and hardware. Our extensive evaluation demonstrates ZEROTUNE’s robustness and accuracy across workloads, various parallelism degrees, unseen operator parameters, and training data efficiency. The evaluations show significant speed-ups with parallelism tuning compared to existing methods. Most notably, our approach has been adopted by Amazon Redshift for query execution time prediction.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2024 | ||||
Autor(en): | Agnihotri, Pratyush | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Accurate Performance Modeling for Distributed Stream Processing: Methods for Performance Benchmarking and Zero-shot Parallelism Tuning in Distributed and Heterogeneous Environments | ||||
Sprache: | Englisch | ||||
Referenten: | Steinmetz, Prof. Dr. Ralf ; Koldehofe, Prof. Dr. Boris | ||||
Publikationsjahr: | 28 Oktober 2024 | ||||
Ort: | Darmstadt | ||||
Kollation: | xxi, 189 Seiten | ||||
Datum der mündlichen Prüfung: | 13 September 2024 | ||||
DOI: | 10.26083/tuprints-00028144 | ||||
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/28144 | ||||
Kurzbeschreibung (Abstract): | Distributed Stream Processing (DSP) systems have emerged as a pivotal paradigm, enabling real-time data analysis using distributed cloud resources. Major Internet companies like Amazon and Google, build on DSP systems for their real-time data workloads. For instance, Amazon provides Apache Flink as a service for implementing DSP workloads. Parallelism is often a desired property of DSP workloads to meet the timeliness and scaling requirements of current applications, necessitating the use of distributed and multi-core cloud resources. However, cloud resources are heterogeneous in nature, which makes understanding the performance of DSP workloads very difficult, as it depends on highly varying resources, i.e., compute, storage, and network. Therefore, (i) understanding the performance and (ii) predicting it for distinct DSP workloads on such heterogeneous cloud environments are both very challenging problems. This thesis solves these two fundamental research challenges by contributing methods for accurate performance modeling of DSP workloads in heterogeneous cloud environments. First, this thesis contributes to methods for performance understanding by proposing PDSP-BENCH, a novel benchmarking system. It tackles three primary challenges of existing work: lack of expressiveness in benchmarking parallel DSP workloads, the need for heterogeneous hardware support, and the need for integration of learned DSP models. Unlike existing systems, PDSP-BENCH enables the evaluation of parallel DSP applications and workloads using both synthetic and real-world applications, offering an expressive and scalable solution. Further, it facilitates the systematic training and evaluation of learned DSP models on diverse streaming workloads, which is crucial for optimizing performance. The extensive evaluation of PDSP-BENCH demonstrates its benchmarking capabilities and highlights the impact of varying query complexities, hardware configurations, and workload parameters on system performance. The key observations of our experiments show the non-linearity and paradoxical effects of parallelism on performance. Second, this thesis contributes to methods on performance prediction and optimization by proposing ZEROTUNE, a novel learned cost model for DSP workloads and an optimizer for parallelism tuning. It provides highly accurate cost predictions while generalizing to (unseen) heterogeneous hardware resources of the cloud. The generalizability of the model is based on transfer learning, the same technique that is used in Large Language Models like ChatGPT. The main idea is to learn from so-called transferable features and parallel graph representation that together enable the model to generalize to unseen DSP workloads and hardware. Our extensive evaluation demonstrates ZEROTUNE’s robustness and accuracy across workloads, various parallelism degrees, unseen operator parameters, and training data efficiency. The evaluations show significant speed-ups with parallelism tuning compared to existing methods. Most notably, our approach has been adopted by Amazon Redshift for query execution time prediction. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
Status: | Verlagsversion | ||||
URN: | urn:nbn:de:tuda-tuprints-281444 | ||||
Zusätzliche Informationen: | This work has been co-funded by the German Research Foundation (DFG) as part of project C2 within the Collaborative Research Center (CRC) 1053 – MAKI. |
||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Datentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Datentechnik > Multimedia Kommunikation |
||||
Hinterlegungsdatum: | 28 Okt 2024 13:13 | ||||
Letzte Änderung: | 29 Okt 2024 13:39 | ||||
PPN: | |||||
Referenten: | Steinmetz, Prof. Dr. Ralf ; Koldehofe, Prof. Dr. Boris | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 13 September 2024 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |