TU Darmstadt / ULB / TUbiblio

Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS

Dewald, Florian ; Rohde, Johanna ; Hochberger, Christian ; Mantel, Heiko (2022)
Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS.
In: ACM Transactions on Reconfigurable Technology and Systems, 15 (3)
doi: 10.1145/3501801
Artikel, Bibliographie

Kurzbeschreibung (Abstract)

High-level synthesis (HLS) can be used to create hardware accelerators for compute-intense software parts such as loop structures. Usually, this process requires significant amount of user interaction to steer kernel selection and optimizations. This can be tedious and time-consuming. In this article, we present an approach that fully autonomously finds independent loop iterations and reductions to create parallelized accelerators. We combine static analysis with information available only at runtime to maximize the parallelism exploited by the created accelerators. For loops where we see potential for parallelism, we create fully parallelized kernel implementations. If static information does not suffice to deduce independence, then we assume independence at compile time. We verify this assumption by statically created checks that are dynamically evaluated at runtime, before using the optimized kernel. Evaluating our approach, we can generate speedups for five out of seven benchmarks. With four loop iterations running in parallel, we achieve ideal speedups of up to 4× and on average speedups of 2.27×, both in comparison to an unoptimized accelerator.

Typ des Eintrags: Artikel
Erschienen: 2022
Autor(en): Dewald, Florian ; Rohde, Johanna ; Hochberger, Christian ; Mantel, Heiko
Art des Eintrags: Bibliographie
Titel: Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS
Sprache: Englisch
Publikationsjahr: 4 Februar 2022
Verlag: ACM
Titel der Zeitschrift, Zeitung oder Schriftenreihe: ACM Transactions on Reconfigurable Technology and Systems
Jahrgang/Volume einer Zeitschrift: 15
(Heft-)Nummer: 3
DOI: 10.1145/3501801
Kurzbeschreibung (Abstract):

High-level synthesis (HLS) can be used to create hardware accelerators for compute-intense software parts such as loop structures. Usually, this process requires significant amount of user interaction to steer kernel selection and optimizations. This can be tedious and time-consuming. In this article, we present an approach that fully autonomously finds independent loop iterations and reductions to create parallelized accelerators. We combine static analysis with information available only at runtime to maximize the parallelism exploited by the created accelerators. For loops where we see potential for parallelism, we create fully parallelized kernel implementations. If static information does not suffice to deduce independence, then we assume independence at compile time. We verify this assumption by statically created checks that are dynamically evaluated at runtime, before using the optimized kernel. Evaluating our approach, we can generate speedups for five out of seven benchmarks. With four loop iterations running in parallel, we achieve ideal speedups of up to 4× and on average speedups of 2.27×, both in comparison to an unoptimized accelerator.

Freie Schlagworte: loop parallelization, scalar evolution analysis, high-level synthesis, system-on-chip, FPGA
Zusätzliche Informationen:

Art.No.: 31

Fachbereich(e)/-gebiet(e): 18 Fachbereich Elektrotechnik und Informationstechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Datentechnik
18 Fachbereich Elektrotechnik und Informationstechnik > Institut für Datentechnik > Rechnersysteme
Hinterlegungsdatum: 11 Apr 2024 12:35
Letzte Änderung: 11 Apr 2024 12:35
PPN:
Export:
Suche nach Titel in: TUfind oder in Google
Frage zum Eintrag Frage zum Eintrag

Optionen (nur für Redakteure)
Redaktionelle Details anzeigen Redaktionelle Details anzeigen