FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees

Razavi, Kamran ; Luthra, Manisha ; Koldehofe, Boris ; Mühlhäuser, Max ; Wang, Lin (2022)
FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees.
28th Real-Time and Embedded Technology and Applications Symposium (RTAS 2022). Milano, Italy (04.-06.05.2022)
doi: 10.1109/RTAS54340.2022.00020
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Deep learning (DL) inference has become an essential building block in modern intelligent applications. Due to the high computational intensity of DL, it is critical to scale DL inference serving systems in response to fluctuating workloads to achieve resource efficiency. Meanwhile, intelligent applications often require strict service level agreements (SLAs), which need to be guaranteed when the system is scaled. The problem is complex and has been tackled only in simple scenarios so far.This paper describes FA2, a fast and accurate autoscaler concept for DL inference serving systems. In contrast to related works, FA2 adopts a general, contrived two-phase approach. Specifically, it starts by capturing the autoscaling challenges in a comprehensive graph-based model. Then, FA2 applies targeted graph transformation and makes autoscaling decisions with an efficient algorithm based on dynamic programming. We implemented FA2 and built and evaluated a prototype. Compared with state-of-the-art autoscaling solutions, our experiments showed FA2 to achieve significant resource reduction (19% under CPUs and 25% under GPUs, on average) in combination with low SLA violations (less than 1.5%). FA2 performed close to the theoretical optimum, matching exactly the optimal decisions (with the least required resources) in 96.8% of all the cases in our evaluation.

Typ des Eintrags:	Konferenzveröffentlichung
Erschienen:	2022
Autor(en):	Razavi, Kamran ; Luthra, Manisha ; Koldehofe, Boris ; Mühlhäuser, Max ; Wang, Lin
Art des Eintrags:	Bibliographie
Titel:	FA2: Fast, Accurate Autoscaling for Serving Deep Learning Inference with SLA Guarantees
Sprache:	Englisch
Publikationsjahr:	29 Juni 2022
Verlag:	IEEE
Buchtitel:	Proceedings: 28th Real-Time and Embedded Technology and Applications Symposium
Veranstaltungstitel:	28th Real-Time and Embedded Technology and Applications Symposium (RTAS 2022)
Veranstaltungsort:	Milano, Italy
Veranstaltungsdatum:	04.-06.05.2022
DOI:	10.1109/RTAS54340.2022.00020
Kurzbeschreibung (Abstract):	Deep learning (DL) inference has become an essential building block in modern intelligent applications. Due to the high computational intensity of DL, it is critical to scale DL inference serving systems in response to fluctuating workloads to achieve resource efficiency. Meanwhile, intelligent applications often require strict service level agreements (SLAs), which need to be guaranteed when the system is scaled. The problem is complex and has been tackled only in simple scenarios so far.This paper describes FA2, a fast and accurate autoscaler concept for DL inference serving systems. In contrast to related works, FA2 adopts a general, contrived two-phase approach. Specifically, it starts by capturing the autoscaling challenges in a comprehensive graph-based model. Then, FA2 applies targeted graph transformation and makes autoscaling decisions with an efficient algorithm based on dynamic programming. We implemented FA2 and built and evaluated a prototype. Compared with state-of-the-art autoscaling solutions, our experiments showed FA2 to achieve significant resource reduction (19% under CPUs and 25% under GPUs, on average) in combination with low SLA violations (less than 1.5%). FA2 performed close to the theoretical optimum, matching exactly the optimal decisions (with the least required resources) in 96.8% of all the cases in our evaluation.
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Telekooperation DFG-Sonderforschungsbereiche (inkl. Transregio) DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche > SFB 1053: MAKI – Multi-Mechanismen-Adaption für das künftige Internet DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche > SFB 1053: MAKI – Multi-Mechanismen-Adaption für das künftige Internet > B: Adaptionsmechanismen DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche > SFB 1053: MAKI – Multi-Mechanismen-Adaption für das künftige Internet > B: Adaptionsmechanismen > Teilprojekt B2: Koordination und Ausführung DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche > SFB 1053: MAKI – Multi-Mechanismen-Adaption für das künftige Internet > C: Kommunikationsmechanismen DFG-Sonderforschungsbereiche (inkl. Transregio) > Sonderforschungsbereiche > SFB 1053: MAKI – Multi-Mechanismen-Adaption für das künftige Internet > C: Kommunikationsmechanismen > Teilprojekt C2: Informationszentrische Sicht
Hinterlegungsdatum:	15 Aug 2022 07:42
Letzte Änderung:	14 Sep 2022 14:47
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung