Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin (2024)
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling.
19th European Conference on Computer Systems (EuroMLSys 2024). Athens, Greece (21.04.-25.04.2024)
doi: 10.1145/3642970.3655833
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2024 |
Autor(en): | Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin |
Art des Eintrags: | Bibliographie |
Titel: | Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling |
Sprache: | Englisch |
Publikationsjahr: | 22 April 2024 |
Verlag: | ACM |
Buchtitel: | EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and Systems |
Veranstaltungstitel: | 19th European Conference on Computer Systems (EuroMLSys 2024) |
Veranstaltungsort: | Athens, Greece |
Veranstaltungsdatum: | 21.04.-25.04.2024 |
DOI: | 10.1145/3642970.3655833 |
Kurzbeschreibung (Abstract): | Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Telekooperation |
TU-Projekte: | DFG|SFB1053|SFB1053 TPA01 Mühlhä DFG|SFB1053|SFB1053 TPB02 Mühlhä |
Hinterlegungsdatum: | 30 Apr 2024 09:21 |
Letzte Änderung: | 15 Aug 2024 11:56 |
PPN: | 52068883X |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |