Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling

Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin (2024)
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling.
19th European Conference on Computer Systems (EuroMLSys 2024). Athens, Greece (21.04.-25.04.2024)
doi: 10.1145/3642970.3655833
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2024
Autor(en):	Razavi, Kamran ; Ghafouri, Saeid ; Mühlhäuser, Max ; Jamshidi, Pooyan ; Wang, Lin
Art des Eintrags:	Bibliographie
Titel:	Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling
Sprache:	Englisch
Publikationsjahr:	22 April 2024
Verlag:	ACM
Buchtitel:	EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and Systems
Veranstaltungstitel:	19th European Conference on Computer Systems (EuroMLSys 2024)
Veranstaltungsort:	Athens, Greece
Veranstaltungsdatum:	21.04.-25.04.2024
DOI:	10.1145/3642970.3655833
Kurzbeschreibung (Abstract):	Mobile and IoT applications increasingly adopt deep learning inference to provide intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless network that is highly variable, leading to the challenge of dynamic Service Level Objectives (SLOs) at the request level. This paper presents Sponge, a novel deep learning inference serving system that maximizes resource efficiency while guaranteeing dynamic SLOs. Sponge achieves its goal by applying in-place vertical scaling, dynamic batching, and request reordering. Specifically, we introduce an Integer Programming formulation to capture the resource allocation problem, providing a mathematical model of the relationship between latency, batch size, and resources. We demonstrate the potential of Sponge through a prototype implementation and preliminary experiments and discuss future works.
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Telekooperation
TU-Projekte:	DFG\|SFB1053\|SFB1053 TPA01 Mühlhä DFG\|SFB1053\|SFB1053 TPB02 Mühlhä
Hinterlegungsdatum:	30 Apr 2024 09:21
Letzte Änderung:	15 Aug 2024 11:56
PPN:	52068883X
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung