TU Darmstadt / ULB / TUbiblio

Rapid Prototyping and Exploration Environment for Generating C-to-Hardware-Compilers

Stock, Florian-Wolfgang (2019):
Rapid Prototyping and Exploration Environment for Generating C-to-Hardware-Compilers.
Darmstadt, Technische Universität, [Online-Edition: https://tuprints.ulb.tu-darmstadt.de/8525],
[Ph.D. Thesis]

Abstract

There is today an ever-increasing demand for more computational power coupled with a desire to minimize energy requirements. Hardware accelerators currently appear to be the best solution to this problem. While general purpose computation with GPUs seem to be very successful in this area, they perform adequately only in those cases where the data access patterns and utilized algorithms fit the underlying architecture. ASICs on the other hand can yield even better results in terms of performance and energy consumption, but are very inflexible, as they are manufactured with an application specific circuitry. Field Programmable Gate Arrays (FPGAs) represent a combination of approaches: With their application specific hardware they provide high computational power while requiring, for many applications, less energy than a CPU or a GPU. On the other hand they are far more flexible than an ASIC due to their reconfigurability.

The only remaining problem is the programming of the FPGAs, as they are far more difficult to program compared to regular software. To allow common software developers, who have at best very limited knowledge in hardware design, to make use of these devices, tools were developed that take a regular high level language and generate hardware from it.

Among such tools, C-to-HDL compilers are a particularly wide-spread approach. These compilers attempt to translate common C code into a hardware description language from which a datapath is generated. Most of these compilers have many restrictions for the input and differ in their underlying generated micro architecture, their scheduling method, their applied optimizations, their execution model and even their target hardware. Thus, a comparison of a certain aspect alone, like their implemented scheduling method or their generated micro architecture, is almost impossible, as they differ in so many other aspects.

This work provides a survey of the existing C-to-HDL compilers and presents a new approach to evaluating and exploring different micro architectures for dynamic scheduling used by such compilers. From a mathematically formulated rule set the Triad compiler generates a backend for the Scale compiler framework, which then implements a hardware generation backend with described dynamic scheduling.

While more than a factor of four slower than hardware from highly optimized compilers, this environment allows easy comparison and exploration of different rule sets and the micro architecture for the dynamically scheduled datapaths generated from them. For demonstration purposes a rule set modeling the COCOMA token flow model from the COMRADE 2.0 compiler was implemented. Multiple variants of it were explored: Savings of up to 11% of the required hardware resources were possible.

Item Type: Ph.D. Thesis
Erschienen: 2019
Creators: Stock, Florian-Wolfgang
Title: Rapid Prototyping and Exploration Environment for Generating C-to-Hardware-Compilers
Language: English
Abstract:

There is today an ever-increasing demand for more computational power coupled with a desire to minimize energy requirements. Hardware accelerators currently appear to be the best solution to this problem. While general purpose computation with GPUs seem to be very successful in this area, they perform adequately only in those cases where the data access patterns and utilized algorithms fit the underlying architecture. ASICs on the other hand can yield even better results in terms of performance and energy consumption, but are very inflexible, as they are manufactured with an application specific circuitry. Field Programmable Gate Arrays (FPGAs) represent a combination of approaches: With their application specific hardware they provide high computational power while requiring, for many applications, less energy than a CPU or a GPU. On the other hand they are far more flexible than an ASIC due to their reconfigurability.

The only remaining problem is the programming of the FPGAs, as they are far more difficult to program compared to regular software. To allow common software developers, who have at best very limited knowledge in hardware design, to make use of these devices, tools were developed that take a regular high level language and generate hardware from it.

Among such tools, C-to-HDL compilers are a particularly wide-spread approach. These compilers attempt to translate common C code into a hardware description language from which a datapath is generated. Most of these compilers have many restrictions for the input and differ in their underlying generated micro architecture, their scheduling method, their applied optimizations, their execution model and even their target hardware. Thus, a comparison of a certain aspect alone, like their implemented scheduling method or their generated micro architecture, is almost impossible, as they differ in so many other aspects.

This work provides a survey of the existing C-to-HDL compilers and presents a new approach to evaluating and exploring different micro architectures for dynamic scheduling used by such compilers. From a mathematically formulated rule set the Triad compiler generates a backend for the Scale compiler framework, which then implements a hardware generation backend with described dynamic scheduling.

While more than a factor of four slower than hardware from highly optimized compilers, this environment allows easy comparison and exploration of different rule sets and the micro architecture for the dynamically scheduled datapaths generated from them. For demonstration purposes a rule set modeling the COCOMA token flow model from the COMRADE 2.0 compiler was implemented. Multiple variants of it were explored: Savings of up to 11% of the required hardware resources were possible.

Place of Publication: Darmstadt
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Embedded Systems and Applications
Date Deposited: 17 Mar 2019 20:55
Official URL: https://tuprints.ulb.tu-darmstadt.de/8525
URN: urn:nbn:de:tuda-tuprints-85250
Referees: Koch, Prof. Dr. Andreas and Hochberger, Prof. Dr. Christian
Refereed / Verteidigung / mdl. Prüfung: 19 March 2018
Alternative Abstract:
Alternative abstract Language
Heutzutage gibt es eine immer größere Nachfrage nach mehr Rechenleistung, bei gleichzeitigem Wunsch immer weniger Energie dafür aufzuwenden. Momentan sind Hardwarebeschleuniger die beste Lösung hierfür. Während GPUs in diesem Gebiet sehr erfolgreich sind, bringen sie ihre beste Leistung nur zur Geltung, wenn die Algorithmen und Speicherzugriffsmuster auf die zugrundeliegende Architektur abgestimmt sind. Anderseits können ASICs noch mehr Leistung bei noch geringerem Energieverbrauch zur Verfügung stellen, sind aber aufgrund ihrer festgelegten Funktionalität sehr unflexibel. Eine Kombination aus beiden Ansätzen sind FPGAs: Sie können bei hoher Energieeffizienz eine große Rechenleistung zur Verfügung stellen, sind aber gleichzeitig durch ihre Rekonfigurierbarkeit flexibler als ASICs. Ein offenes Problem ist aber immer noch die Programmierung der FPGAs, da sie viel schwerer zu programmieren sind als herkömmliche Software. Eine mögliche Lösung hierfür sind C-to-HDL Compiler, die herkömmlichen C Code in eine Hardwarebeschreibungssprache übersetzen, um daraus Hardware zu generieren. Viele von diesen Compilern haben Einschränkungen was den unterstützten Sprachumfang angeht, und unterscheiden sich in den verwendeten Optimierungen, der Ablaufplanung, der generierten Mikroarchitektur, ihrem Ausführungsmodell oder der Zielhardware. Diese vielen Unterschiede machen einen Vergleich bezüglich nur eines Aspektes fast unmöglich. Diese Arbeit bietet eine in die Breite gehende Übersicht über die existierenden C-to-HDL Compiler und stellt ein System vor, das eine schnelle Evaluierung verschiedener Ansätze zur dynamischen Ablaufplanung ermöglicht. Hierzu liest der Compilergenerator Triad einen formalen Satz Regeln ein, aus denen dann ein Compilerbackend für das Compilerframework Scale generiert wird, das C in eine Hardwarebeschreibungsprache übersetzen kann. Die erzeugte Hardware nutzt dabei eine dynamische Ablaufplanung, die durch den formalen Regelsatz definiert wurde. Während die generierte Hardware mehr als viermal langsamer ist, als die von spezialisierten optimierenden Compilern, erlaubt die vorgestellte Umgebung das schnellere Ausprobieren von verschiedensten Ansätze. Zu Demonstrationszwecken wurde im Regelsatz die Ablaufplanung vom COMRADE 2.0 Compiler nachgebildet. Mit nur wenig Aufwand wurde eine Variante erkundet, welche bei Tests bis zu 11% weniger Hardware Ressourcen benötigt.German
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)

View Item View Item