TU Darmstadt / ULB / TUbiblio

Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling

Besnard, Jean-Baptiste ; Tarraf, Ahmad ; Barthélemy, Clément ; Cascajo, Alberto ; Jeannot, Emmanuel ; Shende, Sameer S. ; Wolf, Felix (2023)
Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling.
2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL 2023). Hamburg, Germany (21.-25.05.2023)
doi: 10.1007/978-3-031-40843-4_6
Conference or Workshop Item, Bibliographie

Abstract

High-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. However, it appears that more than punctual optimization is required for current architectures, with users having to choose between multiple intertwined parallelism possibilities, dedicated accelerators, and I/O solutions. Witnessing this challenging context, our paper establishes an automatic feedback loop between how applications run and how they are launched, with a specific focus on I/O. One goal is to optimize how applications are launched through moldability (launch-time malleability). As a first step in this direction, we propose a new, always-on measurement infrastructure based on state-of-the-art cloud technologies adapted for HPC. In this paper, we present the measurement infrastructure and associated design choices. Moreover, we leverage an existing performance modelling tool to generate I/O performance models. We outline sample modelling capabilities, as derived from our measurement chain showing the critical importance of the measurement in future HPC systems, especially concerning resource configurations. Thanks to this precise performance model infrastructure, we can improve moldability and malleability on HPC systems.

Item Type: Conference or Workshop Item
Erschienen: 2023
Creators: Besnard, Jean-Baptiste ; Tarraf, Ahmad ; Barthélemy, Clément ; Cascajo, Alberto ; Jeannot, Emmanuel ; Shende, Sameer S. ; Wolf, Felix
Type of entry: Bibliographie
Title: Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling
Language: English
Date: 25 August 2023
Publisher: Springer
Book Title: High Performance Computing: ISC High Performance 2023 International Workshops
Series Volume: 13999
Event Title: 2nd International Workshop on Malleability Techniques Applications in High-Performance Computing (HPCMALL 2023)
Event Location: Hamburg, Germany
Event Dates: 21.-25.05.2023
DOI: 10.1007/978-3-031-40843-4_6
Abstract:

High-performance computing is not only a race towards the fastest supercomputers but also the science of using such massive machines productively to acquire valuable results – outlining the importance of performance modelling and optimization. However, it appears that more than punctual optimization is required for current architectures, with users having to choose between multiple intertwined parallelism possibilities, dedicated accelerators, and I/O solutions. Witnessing this challenging context, our paper establishes an automatic feedback loop between how applications run and how they are launched, with a specific focus on I/O. One goal is to optimize how applications are launched through moldability (launch-time malleability). As a first step in this direction, we propose a new, always-on measurement infrastructure based on state-of-the-art cloud technologies adapted for HPC. In this paper, we present the measurement infrastructure and associated design choices. Moreover, we leverage an existing performance modelling tool to generate I/O performance models. We outline sample modelling capabilities, as derived from our measurement chain showing the critical importance of the measurement in future HPC systems, especially concerning resource configurations. Thanks to this precise performance model infrastructure, we can improve moldability and malleability on HPC systems.

Uncontrolled Keywords: EU/BMBF|ADMIRE, Malleability, Moldability, Monitoring, performance modeling, EU, BMBF
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Parallel Programming
Date Deposited: 04 Apr 2024 09:47
Last Modified: 04 Apr 2024 09:47
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details