TU Darmstadt / ULB / TUbiblio

Fine-Grained Memory Profiling of GPGPU Kernels

Buelow, Max von ; Guthe, Stefan ; Fellner, Dieter W. (2022)
Fine-Grained Memory Profiling of GPGPU Kernels.
In: Computer Graphics Forum, 41 (7)
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-theart memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

Typ des Eintrags: Artikel
Erschienen: 2022
Autor(en): Buelow, Max von ; Guthe, Stefan ; Fellner, Dieter W.
Art des Eintrags: Bibliographie
Titel: Fine-Grained Memory Profiling of GPGPU Kernels
Sprache: Englisch
Publikationsjahr: 9 Oktober 2022
Verlag: Willey Blackwell
Titel der Zeitschrift, Zeitung oder Schriftenreihe: Computer Graphics Forum
Jahrgang/Volume einer Zeitschrift: 41
(Heft-)Nummer: 7
URL / URN: https://diglib.eg.org/handle/10.1111/cgf14671
Kurzbeschreibung (Abstract):

Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-theart memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

Freie Schlagworte: Computer graphics (CG), (Interactive) simulation (SIM), Graphics processors
Fachbereich(e)/-gebiet(e): 20 Fachbereich Informatik
20 Fachbereich Informatik > Graphisch-Interaktive Systeme
Hinterlegungsdatum: 02 Feb 2023 07:39
Letzte Änderung: 02 Feb 2023 07:39
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen