TU Darmstadt / ULB / TUbiblio

Enhancing the Programmability and Performance Portability of GPU Tensor Operations

Mazaheri, Arya ; Schulte, Johannes ; Moskewicz, Matthew ; Wolf, Felix ; Jannesari, Ali (2019)
Enhancing the Programmability and Performance Portability of GPU Tensor Operations.
25th International Conference on Parallel and Distributed Computing (Euro-Par 2019). Göttingen, Germany (26.-30.08.2019)
doi: 10.1007/978-3-030-29400-7_16
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Deep-learning models with convolutional networks are widely used for many artificial-intelligence tasks, thanks to the increasing adoption of high-throughput GPUs, even in mobile phones. CUDA and OpenCL are the two largely used programming interfaces for accessing the computing power of GPUs. However, attaining code portability has always been a challenge, until the introduction of the Vulkan API. Still, performance portability is not necessarily provided. In this paper, we investigate the unique characteristics of CUDA, OpenCL, and Vulkan kernels and propose a method for abstracting away syntactic differences. Such abstraction creates a single-source kernel which we use for generating code for each GPU programming interface. In addition, we expose auto-tuning parameters to further enhance performance portability. We implemented a selection of convolution operations, covering the core operations needed for deploying three common image-processing neural networks, and tuned them for NVIDIA, AMD, and ARM Mali GPUs. Our experiments show that we can generate deep-learning kernels with minimal effort for new platforms and achieve reasonable performance. Specifically, our Vulkan backend is able to provide competitive performance compared to vendor deep-learning libraries.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2019
Autor(en): Mazaheri, Arya ; Schulte, Johannes ; Moskewicz, Matthew ; Wolf, Felix ; Jannesari, Ali
Art des Eintrags: Bibliographie
Titel: Enhancing the Programmability and Performance Portability of GPU Tensor Operations
Sprache: Englisch
Publikationsjahr: 13 August 2019
Verlag: Springer
Buchtitel: Euro-Par 2019: Parallel Processing
Reihe: Lecture Notes in Computer Science
Band einer Reihe: 11725
Veranstaltungstitel: 25th International Conference on Parallel and Distributed Computing (Euro-Par 2019)
Veranstaltungsort: Göttingen, Germany
Veranstaltungsdatum: 26.-30.08.2019
DOI: 10.1007/978-3-030-29400-7_16
Kurzbeschreibung (Abstract):

Deep-learning models with convolutional networks are widely used for many artificial-intelligence tasks, thanks to the increasing adoption of high-throughput GPUs, even in mobile phones. CUDA and OpenCL are the two largely used programming interfaces for accessing the computing power of GPUs. However, attaining code portability has always been a challenge, until the introduction of the Vulkan API. Still, performance portability is not necessarily provided. In this paper, we investigate the unique characteristics of CUDA, OpenCL, and Vulkan kernels and propose a method for abstracting away syntactic differences. Such abstraction creates a single-source kernel which we use for generating code for each GPU programming interface. In addition, we expose auto-tuning parameters to further enhance performance portability. We implemented a selection of convolution operations, covering the core operations needed for deploying three common image-processing neural networks, and tuned them for NVIDIA, AMD, and ARM Mali GPUs. Our experiments show that we can generate deep-learning kernels with minimal effort for new platforms and achieve reasonable performance. Specifically, our Vulkan backend is able to provide competitive performance compared to vendor deep-learning libraries.

Freie Schlagworte: LOEWE|SF4.0; DFG|320898076; KTS|00.253.2014
Zusätzliche Informationen:

best paper award

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Parallele Programmierung
Hinterlegungsdatum: 04 Apr 2024 11:26
Letzte Änderung: 18 Jun 2024 14:48
PPN: 519227476
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen