TU Darmstadt / ULB / TUbiblio

Adaptive GPU Array Layout Auto-Tuning

Weber, Nicolas and Goesele, Michael (2016):
Adaptive GPU Array Layout Auto-Tuning.
In: Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, ACM, In: SEM4HPC, Kyoto, DOI: 10.1145/2916026.2916031, [Online-Edition: http://dx.doi.org/10.1145/2916026.2916031],
[Conference or Workshop Item]

Abstract

Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.

Item Type: Conference or Workshop Item
Erschienen: 2016
Creators: Weber, Nicolas and Goesele, Michael
Title: Adaptive GPU Array Layout Auto-Tuning
Language: English
Abstract:

Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.

Journal or Publication Title: SEM4HPC '16 Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications
Title of Book: Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications
Publisher: ACM
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Graphics, Capture and Massively Parallel Computing
Exzellenzinitiative > Graduate Schools > Graduate School of Computational Engineering (CE)
Exzellenzinitiative > Graduate Schools
Exzellenzinitiative
Event Title: SEM4HPC
Event Location: Kyoto
Date Deposited: 08 Sep 2016 06:36
DOI: 10.1145/2916026.2916031
Official URL: http://dx.doi.org/10.1145/2916026.2916031
Related URLs:
Export:

Optionen (nur für Redakteure)

View Item View Item