TU Darmstadt / ULB / TUbiblio

Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling

Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin (2024)
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling.
19th European Conference on Computer Systems (EuroMLSys 2024). Athens, Greece (21.04.-25.04.2024)
doi: 10.1145/3642970.3655833
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2024
Autor(en): Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin
Art des Eintrags: Bibliographie
Titel: Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling
Sprache: Englisch
Publikationsjahr: 22 April 2024
Verlag: ACM
Buchtitel: EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and Systems
Veranstaltungstitel: 19th European Conference on Computer Systems (EuroMLSys 2024)
Veranstaltungsort: Athens, Greece
Veranstaltungsdatum: 21.04.-25.04.2024
DOI: 10.1145/3642970.3655833
Kurzbeschreibung (Abstract):

Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Telekooperation
TU-Projekte: DFG|SFB1053|SFB1053 TPA01 Mühlhä
DFG|SFB1053|SFB1053 TPB02 Mühlhä
Hinterlegungsdatum: 30 Apr 2024 09:21
Letzte Änderung: 30 Apr 2024 09:21
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen