Mazaheri, Arya ; Schulte, Johannes ; Moskewicz, Matthew ; Wolf, Felix ; Jannesari, Ali (2019)
Enhancing the Programmability and Performance Portability of GPU Tensor Operations.
25th International Conference on Parallel and Distributed Computing (Euro-Par 2019). Göttingen, Germany (26.-30.08.2019)
doi: 10.1007/978-3-030-29400-7_16
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Deep-learning models with convolutional networks are widely used for many artificial-intelligence tasks, thanks to the increasing adoption of high-throughput GPUs, even in mobile phones. CUDA and OpenCL are the two largely used programming interfaces for accessing the computing power of GPUs. However, attaining code portability has always been a challenge, until the introduction of the Vulkan API. Still, performance portability is not necessarily provided. In this paper, we investigate the unique characteristics of CUDA, OpenCL, and Vulkan kernels and propose a method for abstracting away syntactic differences. Such abstraction creates a single-source kernel which we use for generating code for each GPU programming interface. In addition, we expose auto-tuning parameters to further enhance performance portability. We implemented a selection of convolution operations, covering the core operations needed for deploying three common image-processing neural networks, and tuned them for NVIDIA, AMD, and ARM Mali GPUs. Our experiments show that we can generate deep-learning kernels with minimal effort for new platforms and achieve reasonable performance. Specifically, our Vulkan backend is able to provide competitive performance compared to vendor deep-learning libraries.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2019 |
Autor(en): | Mazaheri, Arya ; Schulte, Johannes ; Moskewicz, Matthew ; Wolf, Felix ; Jannesari, Ali |
Art des Eintrags: | Bibliographie |
Titel: | Enhancing the Programmability and Performance Portability of GPU Tensor Operations |
Sprache: | Englisch |
Publikationsjahr: | 13 August 2019 |
Verlag: | Springer |
Buchtitel: | Euro-Par 2019: Parallel Processing |
Reihe: | Lecture Notes in Computer Science |
Band einer Reihe: | 11725 |
Veranstaltungstitel: | 25th International Conference on Parallel and Distributed Computing (Euro-Par 2019) |
Veranstaltungsort: | Göttingen, Germany |
Veranstaltungsdatum: | 26.-30.08.2019 |
DOI: | 10.1007/978-3-030-29400-7_16 |
Kurzbeschreibung (Abstract): | Deep-learning models with convolutional networks are widely used for many artificial-intelligence tasks, thanks to the increasing adoption of high-throughput GPUs, even in mobile phones. CUDA and OpenCL are the two largely used programming interfaces for accessing the computing power of GPUs. However, attaining code portability has always been a challenge, until the introduction of the Vulkan API. Still, performance portability is not necessarily provided. In this paper, we investigate the unique characteristics of CUDA, OpenCL, and Vulkan kernels and propose a method for abstracting away syntactic differences. Such abstraction creates a single-source kernel which we use for generating code for each GPU programming interface. In addition, we expose auto-tuning parameters to further enhance performance portability. We implemented a selection of convolution operations, covering the core operations needed for deploying three common image-processing neural networks, and tuned them for NVIDIA, AMD, and ARM Mali GPUs. Our experiments show that we can generate deep-learning kernels with minimal effort for new platforms and achieve reasonable performance. Specifically, our Vulkan backend is able to provide competitive performance compared to vendor deep-learning libraries. |
Freie Schlagworte: | LOEWE|SF4.0; DFG|320898076; KTS|00.253.2014 |
Zusätzliche Informationen: | best paper award |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Parallele Programmierung |
Hinterlegungsdatum: | 04 Apr 2024 11:26 |
Letzte Änderung: | 18 Jun 2024 14:48 |
PPN: | 519227476 |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |