Khuda Bukhsh, Wasiur R. ; Kar, Sounak ; Alt, Bastian ; Rizk, Amr ; Koeppl, Heinz (2022)
Generalized Cost-Based Job Scheduling in Very Large Heterogeneous Cluster Systems.
In: IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (11)
doi: 10.26083/tuprints-00021667
Artikel, Zweitveröffentlichung, Postprint
Es ist eine neuere Version dieses Eintrags verfügbar. |
Kurzbeschreibung (Abstract)
We study job assignment in large, heterogeneous resource-sharing clusters of servers with finite buffers. This load balancing problem arises naturally in today's communication and big data systems, such as Amazon Web Services, Network Service Function Chains, and Stream Processing. Arriving jobs are dispatched to a server, following a load balancing policy that optimizes a performance criterion such as job completion time. Our contribution is a randomized Cost-Based Scheduling (CBS) policy in which the job assignment is driven by general cost functions of the server queue lengths. Beyond existing schemes, such as the Join the Shortest Queue (JSQ), the power of d or the SQ(d) and the capacity-weighted JSQ, the notion of CBS yields new application-specific policies such as hybrid locally uniform JSQ. As today's data center clusters have thousands of servers, exact analysis of CBS policies is tedious. In this article, we derive a scaling limit when the number of servers grows large, facilitating a comparison of various CBS policies with respect to their transient as well as steady state behavior. A byproduct of our derivations is the relationship between the queue filling proportions and the server buffer sizes, which cannot be obtained from infinite buffer models. Finally, we provide extensive numerical evaluations and discuss several applications including multi-stage systems.
Typ des Eintrags: | Artikel |
---|---|
Erschienen: | 2022 |
Autor(en): | Khuda Bukhsh, Wasiur R. ; Kar, Sounak ; Alt, Bastian ; Rizk, Amr ; Koeppl, Heinz |
Art des Eintrags: | Zweitveröffentlichung |
Titel: | Generalized Cost-Based Job Scheduling in Very Large Heterogeneous Cluster Systems |
Sprache: | Englisch |
Publikationsjahr: | 2022 |
Ort: | Darmstadt |
Publikationsdatum der Erstveröffentlichung: | 2020 |
Verlag: | IEEE |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | IEEE Transactions on Parallel and Distributed Systems |
Jahrgang/Volume einer Zeitschrift: | 31 |
(Heft-)Nummer: | 11 |
Kollation: | 14 Seiten |
DOI: | 10.26083/tuprints-00021667 |
URL / URN: | https://tuprints.ulb.tu-darmstadt.de/21667 |
Zugehörige Links: | |
Herkunft: | Zweitveröffentlichungsservice |
Kurzbeschreibung (Abstract): | We study job assignment in large, heterogeneous resource-sharing clusters of servers with finite buffers. This load balancing problem arises naturally in today's communication and big data systems, such as Amazon Web Services, Network Service Function Chains, and Stream Processing. Arriving jobs are dispatched to a server, following a load balancing policy that optimizes a performance criterion such as job completion time. Our contribution is a randomized Cost-Based Scheduling (CBS) policy in which the job assignment is driven by general cost functions of the server queue lengths. Beyond existing schemes, such as the Join the Shortest Queue (JSQ), the power of d or the SQ(d) and the capacity-weighted JSQ, the notion of CBS yields new application-specific policies such as hybrid locally uniform JSQ. As today's data center clusters have thousands of servers, exact analysis of CBS policies is tedious. In this article, we derive a scaling limit when the number of servers grows large, facilitating a comparison of various CBS policies with respect to their transient as well as steady state behavior. A byproduct of our derivations is the relationship between the queue filling proportions and the server buffer sizes, which cannot be obtained from infinite buffer models. Finally, we provide extensive numerical evaluations and discuss several applications including multi-stage systems. |
Freie Schlagworte: | Job Scheduling, performance evaluation, mean-field limit |
Status: | Postprint |
URN: | urn:nbn:de:tuda-tuprints-216679 |
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik 500 Naturwissenschaften und Mathematik > 530 Physik |
Fachbereich(e)/-gebiet(e): | 18 Fachbereich Elektrotechnik und Informationstechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik > Bioinspirierte Kommunikationssysteme 18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Nachrichtentechnik 18 Fachbereich Elektrotechnik und Informationstechnik > Self-Organizing Systems Lab |
Hinterlegungsdatum: | 20 Jul 2022 14:03 |
Letzte Änderung: | 21 Jul 2022 13:28 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Verfügbare Versionen dieses Eintrags
- Generalized Cost-Based Job Scheduling in Very Large Heterogeneous Cluster Systems. (deposited 20 Jul 2022 14:03) [Gegenwärtig angezeigt]
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |