Weber, Nicolas ; Goesele, Michael (2016)
Adaptive GPU Array Layout Auto-Tuning.
SEM4HPC. Kyoto
doi: 10.1145/2916026.2916031
Konferenzveröffentlichung, Bibliographie
Kurzbeschreibung (Abstract)
Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code.
Typ des Eintrags: | Konferenzveröffentlichung |
---|---|
Erschienen: | 2016 |
Autor(en): | Weber, Nicolas ; Goesele, Michael |
Art des Eintrags: | Bibliographie |
Titel: | Adaptive GPU Array Layout Auto-Tuning |
Sprache: | Englisch |
Publikationsjahr: | 3 August 2016 |
Verlag: | ACM |
Titel der Zeitschrift, Zeitung oder Schriftenreihe: | SEM4HPC '16 Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications |
Buchtitel: | Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications |
Veranstaltungstitel: | SEM4HPC |
Veranstaltungsort: | Kyoto |
DOI: | 10.1145/2916026.2916031 |
URL / URN: | http://dx.doi.org/10.1145/2916026.2916031 |
Zugehörige Links: | |
Kurzbeschreibung (Abstract): | Optimal performance is an important goal in compute intensive applications. For GPU applications, this requires a lot of experience and knowledge about the algorithms and the underlying hardware, making them an ideal target for auto-tuning approaches. We present an auto-tuner which optimizes array layouts in CUDA applications. Depending on the data and program parameters, kernels can have varying optimal configurations. We thus adjust array layouts adaptively at runtime and achieve or even exceed performance of hand optimized code. We automatically detect data characteristics to identify different performance scenarios without user input or additional programming. We perform an empirical analysis of the application in order to construct our decision models. Our adaptive optimization requires in principle profiling data for an extremely high number of scenarios which cannot be exhaustively evaluated for complex applications. We solve this by extending a previously published method that is able to efficiently profile single kernel calls and enhance it to find application-wide optimal solutions. Our method is able to optimize applications in a few minutes, reaching speed ups of up to 20% compared to hand optimized code. |
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik 20 Fachbereich Informatik > Graphics, Capture and Massively Parallel Computing Exzellenzinitiative Exzellenzinitiative > Graduiertenschulen Exzellenzinitiative > Graduiertenschulen > Graduate School of Computational Engineering (CE) |
Hinterlegungsdatum: | 08 Sep 2016 06:36 |
Letzte Änderung: | 09 Dez 2021 11:45 |
PPN: | |
Export: | |
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |