Liebig, Björn (2018)
Domain-Specific High Level Synthesis of Floating-Point Computations to Resource-Shared Microarchitectures.
Technische Universität Darmstadt
Dissertation, Erstveröffentlichung
Kurzbeschreibung (Abstract)
Many scenarios demand a high processing power often combined with a limited energy budget. A way to increase the processing power without increasing the power consumption is the use of hardware accelerators. While the implementation of such an accelerator as an application specific integrated circuit comes with very high development costs, reconfigurable logic devices such as FPGAs can lower the development costs and reduce development time, thus shortening time to market. To even further reduce development costs, the development of the circuit itself can be partially automated by applying a technique called high-level synthesis. However, current high-level synthesis approaches have difficulties to handle floating-point computations, especially when it comes to large blocks of floating-point code.
The focus in this thesis targets on the efficient implementation of floating-point arithmetic in FPGAs. To improve the performance new FPGA-optimized computing units are developed. This work proposes two new architectures for floating-point fused multiply-adds, and also presents and compares two low-latency dividers based on the Goldschmidt algorithm. The proposed units significantly outperform state-of-the-art in terms of latency.
Codes from domains such as control engineering and numerical simulation often contain large loop bodies holding with (tens of) thousands of double-precision floating-point operations. Both academic as well as industrial synthesis tools have great difficulty coping with such input programs. In this thesis, the academic compiler Nymble is extended to Nymble-RS, a branch with the necessary features to handle such large blocks of floating-point code.
The proposed techniques integrated in a tool chain that translates convex solvers defined in a domain specific language to hardware. The generated accelerators reach clock frequencies of more than 200 MHz. They exceed the performance of hardware generated by a state-of-the-art high-level synthesis tools by more than 5.7x and offers speed-ups of up to 5.2x over software executing on the 800 MHz Cortex-A9 CPUs used in typical reconfigurable system-on-chips.
Furthermore, the developed techniques are used to accelerate bioinformatics simulations defined in CellML language by using C-code as intermediate representation. The generated hardware exceeds the performance of current generation desktop CPUs in most cases, while requiring only 20-30% area on a mid-sized FPGA. Meanwhile, energy savings of up to 96% are reached.
Typ des Eintrags: | Dissertation | ||||
---|---|---|---|---|---|
Erschienen: | 2018 | ||||
Autor(en): | Liebig, Björn | ||||
Art des Eintrags: | Erstveröffentlichung | ||||
Titel: | Domain-Specific High Level Synthesis of Floating-Point Computations to Resource-Shared Microarchitectures | ||||
Sprache: | Englisch | ||||
Referenten: | Koch, Prof. Dr. Andreas ; Berekovic, Prof. Dr. Mladen | ||||
Publikationsjahr: | 2018 | ||||
Ort: | Darmstadt | ||||
Datum der mündlichen Prüfung: | 13 März 2018 | ||||
URL / URN: | http://tuprints.ulb.tu-darmstadt.de/7338 | ||||
Kurzbeschreibung (Abstract): | Many scenarios demand a high processing power often combined with a limited energy budget. A way to increase the processing power without increasing the power consumption is the use of hardware accelerators. While the implementation of such an accelerator as an application specific integrated circuit comes with very high development costs, reconfigurable logic devices such as FPGAs can lower the development costs and reduce development time, thus shortening time to market. To even further reduce development costs, the development of the circuit itself can be partially automated by applying a technique called high-level synthesis. However, current high-level synthesis approaches have difficulties to handle floating-point computations, especially when it comes to large blocks of floating-point code. The focus in this thesis targets on the efficient implementation of floating-point arithmetic in FPGAs. To improve the performance new FPGA-optimized computing units are developed. This work proposes two new architectures for floating-point fused multiply-adds, and also presents and compares two low-latency dividers based on the Goldschmidt algorithm. The proposed units significantly outperform state-of-the-art in terms of latency. Codes from domains such as control engineering and numerical simulation often contain large loop bodies holding with (tens of) thousands of double-precision floating-point operations. Both academic as well as industrial synthesis tools have great difficulty coping with such input programs. In this thesis, the academic compiler Nymble is extended to Nymble-RS, a branch with the necessary features to handle such large blocks of floating-point code. The proposed techniques integrated in a tool chain that translates convex solvers defined in a domain specific language to hardware. The generated accelerators reach clock frequencies of more than 200 MHz. They exceed the performance of hardware generated by a state-of-the-art high-level synthesis tools by more than 5.7x and offers speed-ups of up to 5.2x over software executing on the 800 MHz Cortex-A9 CPUs used in typical reconfigurable system-on-chips. Furthermore, the developed techniques are used to accelerate bioinformatics simulations defined in CellML language by using C-code as intermediate representation. The generated hardware exceeds the performance of current generation desktop CPUs in most cases, while requiring only 20-30% area on a mid-sized FPGA. Meanwhile, energy savings of up to 96% are reached. |
||||
Alternatives oder übersetztes Abstract: |
|
||||
URN: | urn:nbn:de:tuda-tuprints-73387 | ||||
Sachgruppe der Dewey Dezimalklassifikatin (DDC): | 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik | ||||
Fachbereich(e)/-gebiet(e): | 20 Fachbereich Informatik > Eingebettete Systeme und ihre Anwendungen 20 Fachbereich Informatik |
||||
Hinterlegungsdatum: | 27 Mai 2018 19:55 | ||||
Letzte Änderung: | 27 Mai 2018 19:55 | ||||
PPN: | |||||
Referenten: | Koch, Prof. Dr. Andreas ; Berekovic, Prof. Dr. Mladen | ||||
Datum der mündlichen Prüfung / Verteidigung / mdl. Prüfung: | 13 März 2018 | ||||
Export: | |||||
Suche nach Titel in: | TUfind oder in Google |
Frage zum Eintrag |
Optionen (nur für Redakteure)
Redaktionelle Details anzeigen |