IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Ghafouri, Saeid ; Razavi, Kamran ; Salmani, Mehran ; Sanaee, Alireza ; Lorido-Botran, Tania ; Wang, Lin ; Doyle, Joseph ; Jamshidi, Pooyan (2023)
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency.
doi: 10.48550/arXiv.2308.12871
Report, Bibliographie

Kurzbeschreibung (Abstract)

Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%.

Typ des Eintrags:	Report
Erschienen:	2023
Autor(en):	Ghafouri, Saeid ; Razavi, Kamran ; Salmani, Mehran ; Sanaee, Alireza ; Lorido-Botran, Tania ; Wang, Lin ; Doyle, Joseph ; Jamshidi, Pooyan
Art des Eintrags:	Bibliographie
Titel:	IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency
Sprache:	Englisch
Publikationsjahr:	24 August 2023
Verlag:	arXiv
Reihe:	Distributed, Parallel, and Cluster Computing
Kollation:	21 Seiten
DOI:	10.48550/arXiv.2308.12871
Zugehörige Links:	Verwandtes Werk
Kurzbeschreibung (Abstract):	Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%.
Zusätzliche Informationen:	1. Version
Fachbereich(e)/-gebiet(e):	20 Fachbereich Informatik 20 Fachbereich Informatik > Telekooperation
TU-Projekte:	DFG\|SFB1053\|SFB1053 TPB02 Mühlhä
Hinterlegungsdatum:	02 Aug 2024 08:02
Letzte Änderung:	19 Dez 2024 11:10
PPN:
Export:

Suche nach Titel in:	TUfind oder in Google

Frage zum Eintrag

Optionen (nur für Redakteure)

Redaktionelle Details anzeigen

OAI 2.0-Basis-URL: https://tubiblio.ulb.tu-darmstadt.de/cgi/oai2 TUbiblio verwendet EPrints 3.

Drucken |

Impressum |

Datenschutzerklärung

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Ghafouri, Saeid ; Razavi, Kamran ; Salmani, Mehran ; Sanaee, Alireza ; Lorido-Botran, Tania ; Wang, Lin ; Doyle, Joseph ; Jamshidi, Pooyan (2023)IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency. doi: 10.48550/arXiv.2308.12871 Report, Bibliographie

Kurzbeschreibung (Abstract)

Ghafouri, Saeid ; Razavi, Kamran ; Salmani, Mehran ; Sanaee, Alireza ; Lorido-Botran, Tania ; Wang, Lin ; Doyle, Joseph ; Jamshidi, Pooyan (2023)
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency.
doi: 10.48550/arXiv.2308.12871
Report, Bibliographie