TU Darmstadt / ULB / TUbiblio

Adaptive GPU Array Layout Auto-Tuning

Weber, Nicolas ; Goesele, Michael (2016)
Adaptive GPU Array Layout Auto-Tuning.
SEM4HPC. Kyoto
doi: 10.1145/2916026.2916031
Konferenzveröffentlichung, Bibliographie

Kurzbeschreibung (Abstract)

Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.

Typ des Eintrags: Konferenzveröffentlichung
Erschienen: 2016
Autor(en): Weber, Nicolas ; Goesele, Michael
Art des Eintrags: Bibliographie
Titel: Adaptive GPU Array Layout Auto-Tuning
Sprache: Englisch
Publikationsjahr: 3 August 2016
Verlag: ACM
Titel der Zeitschrift, Zeitung oder Schriftenreihe: SEM4HPC '16 Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications
Buchtitel: Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications
Veranstaltungstitel: SEM4HPC
Veranstaltungsort: Kyoto
DOI: 10.1145/2916026.2916031
URL / URN: http://dx.doi.org/10.1145/2916026.2916031
Zugehörige Links:
Kurzbeschreibung (Abstract):

Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.

Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Graphics, Capture and Massively Parallel Computing
Exzellenzinitiative
Exzellenzinitiative > Graduiertenschulen
Exzellenzinitiative > Graduiertenschulen > Graduate School of Computational Engineering (CE)
Hinterlegungsdatum: 08 Sep 2016 06:36
Letzte Änderung: 09 Dez 2021 11:45
PPN:
Zugehörige Links:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen