TU Darmstadt / ULB / TUbiblio

Capability-based Scheduling of Scientific Workflows in the Cloud

Krämer, Michel (2020)
Capability-based Scheduling of Scientific Workflows in the Cloud.
9th International Conference on Data Science, Technology and Applications.
doi: 10.5220/0009805400430054
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

We present a distributed task scheduling algorithm and a software architecture for a system executing scientific workflows in the Cloud. The main challenges we address are (i) capability-based scheduling, which means that individual workflow tasks may require specific capabilities from highly heterogeneous compute machines in the Cloud, (ii) a dynamic environment where resources can be added and removed on demand, (iii) scalability in terms of scientific workflows consisting of hundreds of thousands of tasks, and (iv) fault tolerance because in the Cloud, faults can happen at any time. Our software architecture consists of loosely coupled components communicating with each other through an event bus and a shared database. Workflow graphs are converted to process chains that can be scheduled independently. Our scheduling algorithm collects distinct required capability sets for the process chains, asks the agents which of these sets they can manage, and then assigns process chains accordingly. We present the results of four experiments we conducted to evaluate if our approach meets the aforementioned challenges. We finish the paper with a discussion, conclusions, and future research opportunities. An implementation of our algorithm and software architecture is publicly available with the open-source workflow management system “Steep”.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2020
Autor(en): Krämer, Michel
Art des Eintrags: Bibliographie
Titel: Capability-based Scheduling of Scientific Workflows in the Cloud
Sprache: Englisch
Publikationsjahr: 2020
Verlag: Science and Technology Publications
Buchtitel: Proceedings of the 9th International Conference on Data Science, Technology and Applications - Volume 1 : DATA
Veranstaltungstitel: 9th International Conference on Data Science, Technology and Applications
DOI: 10.5220/0009805400430054
Kurzbeschreibung (Abstract):

We present a distributed task scheduling algorithm and a software architecture for a system executing scientific workflows in the Cloud. The main challenges we address are (i) capability-based scheduling, which means that individual workflow tasks may require specific capabilities from highly heterogeneous compute machines in the Cloud, (ii) a dynamic environment where resources can be added and removed on demand, (iii) scalability in terms of scientific workflows consisting of hundreds of thousands of tasks, and (iv) fault tolerance because in the Cloud, faults can happen at any time. Our software architecture consists of loosely coupled components communicating with each other through an event bus and a shared database. Workflow graphs are converted to process chains that can be scheduled independently. Our scheduling algorithm collects distinct required capability sets for the process chains, asks the agents which of these sets they can manage, and then assigns process chains accordingly. We present the results of four experiments we conducted to evaluate if our approach meets the aforementioned challenges. We finish the paper with a discussion, conclusions, and future research opportunities. An implementation of our algorithm and software architecture is publicly available with the open-source workflow management system “Steep”.

Freie Schlagworte: Distributed systems, Cloud computing
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Graphisch-Interaktive Systeme
Hinterlegungsdatum: 22 Feb 2021 08:09
Letzte Änderung: 22 Feb 2021 08:09
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen