TU Darmstadt / ULB / TUbiblio

Efficient Job Scheduling for Clusters with Shared Tiered Storage

Lackner, Leah E. ; Fard, Hamid Mohammadi ; Wolf, Felix (2019)
Efficient Job Scheduling for Clusters with Shared Tiered Storage.
19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Larnaca, Cyprus (14.-17.05.2019)
doi: 10.1109/CCGRID.2019.00046
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

New fast storage technologies such as non-volatile memory are becoming ubiquitous in HPC systems with one or two orders of magnitude higher I/O bandwidth than traditional back-end storage systems. They can be used to heavily speed-up I/O operations, an essential prerequisite for data-intensive exascale computing capabilities. However, since the overall capacity of the fast storage available in a system is limited, an individual job may not always benefit if access to fast storage implies longer waiting time in the queue. This is obvious if fast storage is shared across the system. We therefore argue that the decision of whether or not to use fast storage should be supported by the batch scheduler, which can estimate when the amount of fast storage a job desires will become available. We present a scheduling algorithm with this functionality and show in simulations significantly reduced makespan and turnaround times in comparison to always using fast storage, always using slow back-end storage, and random storage assignment.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2019
Autor(en): Lackner, Leah E. ; Fard, Hamid Mohammadi ; Wolf, Felix
Art des Eintrags: Bibliographie
Titel: Efficient Job Scheduling for Clusters with Shared Tiered Storage
Sprache: Englisch
Publikationsjahr: 8 Juli 2019
Verlag: IEEE
Buchtitel: Proceedings: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Veranstaltungstitel: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Veranstaltungsort: Larnaca, Cyprus
Veranstaltungsdatum: 14.-17.05.2019
DOI: 10.1109/CCGRID.2019.00046
Kurzbeschreibung (Abstract):

New fast storage technologies such as non-volatile memory are becoming ubiquitous in HPC systems with one or two orders of magnitude higher I/O bandwidth than traditional back-end storage systems. They can be used to heavily speed-up I/O operations, an essential prerequisite for data-intensive exascale computing capabilities. However, since the overall capacity of the fast storage available in a system is limited, an individual job may not always benefit if access to fast storage implies longer waiting time in the queue. This is obvious if fast storage is shared across the system. We therefore argue that the decision of whether or not to use fast storage should be supported by the batch scheduler, which can estimate when the amount of fast storage a job desires will become available. We present a scheduling algorithm with this functionality and show in simulations significantly reduced makespan and turnaround times in comparison to always using fast storage, always using slow back-end storage, and random storage assignment.

Freie Schlagworte: EU|GA 785907, EU|GA 720270
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Parallele Programmierung
Zentrale Einrichtungen
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ)
Zentrale Einrichtungen > Hochschulrechenzentrum (HRZ) > Hochleistungsrechner
Hinterlegungsdatum: 04 Apr 2024 11:21
Letzte Änderung: 04 Apr 2024 11:21
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen