TU Darmstadt / ULB / TUbiblio

Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling

Besnard, Jean-Baptiste ; Tarraf, Ahmad ; Barthélemy, Clément ; Cascajo, Alberto ; Jeannot, Emmanuel ; Shende, Sameer S. ; Wolf, Felix (2023)
Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling.
2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL 2023). Hamburg, Germany (21.-25.05.2023)
doi: 10.1007/978-3-031-40843-4_6
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

High-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. However, it appears that more than punctual optimization is required for current architectures, with users having to choose between multiple intertwined parallelism possibilities, dedicated accelerators, and I/O solutions. Witnessing this challenging context, our paper establishes an automatic feedback loop between how applications run and how they are launched, with a specific focus on I/O. One goal is to optimize how applications are launched through moldability (launch-time malleability). As a first step in this direction, we propose a new, always-on measurement infrastructure based on state-of-the-art cloud technologies adapted for HPC. In this paper, we present the measurement infrastructure and associated design choices. Moreover, we leverage an existing performance modelling tool to generate I/O performance models. We outline sample modelling capabilities, as derived from our measurement chain showing the critical importance of the measurement in future HPC systems, especially concerning resource configurations. Thanks to this precise performance model infrastructure, we can improve moldability and malleability on HPC systems.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2023
Autor(en): Besnard, Jean-Baptiste ; Tarraf, Ahmad ; Barthélemy, Clément ; Cascajo, Alberto ; Jeannot, Emmanuel ; Shende, Sameer S. ; Wolf, Felix
Art des Eintrags: Bibliographie
Titel: Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling
Sprache: Englisch
Publikationsjahr: 25 August 2023
Verlag: Springer
Buchtitel: High Performance Computing: ISC High Performance 2023 International Workshops
Band einer Reihe: 13999
Veranstaltungstitel: 2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL 2023)
Veranstaltungsort: Hamburg, Germany
Veranstaltungsdatum: 21.-25.05.2023
DOI: 10.1007/978-3-031-40843-4_6
Kurzbeschreibung (Abstract):

High-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. However, it appears that more than punctual optimization is required for current architectures, with users having to choose between multiple intertwined parallelism possibilities, dedicated accelerators, and I/O solutions. Witnessing this challenging context, our paper establishes an automatic feedback loop between how applications run and how they are launched, with a specific focus on I/O. One goal is to optimize how applications are launched through moldability (launch-time malleability). As a first step in this direction, we propose a new, always-on measurement infrastructure based on state-of-the-art cloud technologies adapted for HPC. In this paper, we present the measurement infrastructure and associated design choices. Moreover, we leverage an existing performance modelling tool to generate I/O performance models. We outline sample modelling capabilities, as derived from our measurement chain showing the critical importance of the measurement in future HPC systems, especially concerning resource configurations. Thanks to this precise performance model infrastructure, we can improve moldability and malleability on HPC systems.

Freie Schlagworte: EU/BMBF|ADMIRE, Malleability, Moldability, Monitoring, performance modeling, EU, BMBF
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Parallele Programmierung
Hinterlegungsdatum: 04 Apr 2024 09:47
Letzte Änderung: 04 Apr 2024 09:47
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen