Background
Nowadays Field-Programmable Gate Arrays (FPGAs) have become a highly adaptable and powerful technology, allowing engineers to design custom, reconfigurable digital circuits. Their distinct blend of flexibility, parallel processing, and fast prototyping makes them valuable across various industries, including telecommunications, aerospace, and automotive. Although FPGA design has been evolving rapidly in the last decade, RTL-based design using Verilog/VHDL remains the mainstream approach for production and high-performance applications. However, HLS (High-Level Synthesis) adoption is increasing, especially in AI, networking, and rapid prototyping.
Speaking of the HLS adoption, HLS application has increased primarily in domains where design time matters more than fine-grained hardware optimizations. Such example includes rapid prototyping of FPGA-based AI accelerators (e.g., Xilinx DPU), Image processing engine and DNN modeling. The core purpose of HLS is to elevate the abstraction level of hardware design by incorporating software development methodologies. This approach streamlines the design process, enhances productivity, and accelerates time-to-market.
In this blog, I’ll introduce Microchip HLS, which is SmartHLS, as an example, and we take a deep dive into its inner workings, and explore how to write SmartHLS C/C++ code. Despite the focus is on Microchip SmartHLS, the same methodology applies to alternative tools such as Vitis HLS from AMD and Catapult HLS from Siemens.
What is High Level Synthesis (HLS)
High-Level Synthesis (HLS) is the process of generating a hardware circuit from a high-level software description (e.g. C/C++), ensuring that the resulting circuit in RTL level performs the same functionality as the original software program. The automatic RTL generation simplifies and streamlines the process of transforming high-level algorithms into hardware, making HLS quite suitable to algorithmic exploration. Additionally, HLS offers the following benefits to designers: [1]
Easy Debugging: HLS enables designers to troubleshoot design issues in software level
Code Reusability: HLS enables designers to reuse HLS IPs to different designs by reusing the associated C++ functions
Parallelism: HLS tools automatically identify opportunities for parallelism and pipelining in C/C++ code
In SmartHLS in particular, a input program written in C/C++ is fed into SmartHLS with the associated output being a circuit specification in the Verilog hardware description language. Such HLS-generated Verilog can be given to Libero, which is Microchip’s EDA tool targeting Microchip FPGAs, to be programmed on a Microchip FPGA. Futhermore, SmartHLS offers the automation of the generated hardware IP into Libero’s SmartDesign as an HDL+ component, accompanied by a Tcl script for seamless import. It also provides a C++ accelerator driver API, allowing embedded processors to manage the hardware efficiently.
Additionally, SmartHLS offers the option to combine user code with the driver API and cross-compile it into an executable, enabling direct deployment on a RISC-V processor within an SoC design. The following figure, which is taken from the SmartHLS user guide, demonstrates this entire end-to-end process [2]:
How High Level Synthesis Works
Generally speaking, HLS performs the software-to-hardware translation in the following steps in sequential order:
Step No | Step Name | Description |
1 | Allocation | Define constraints for the generated hardware. These include the number of hardware resources available for specific tasks, the target clock period, and other constraints specified by the user. |
2 | Scheduling | Scheduling assigns software operations to specific clock cycles in hardware as software description abstracts away from concepts such as clocks and finite state machines. The scheduling ensures that operations fit within the cycle time and respects data dependencies between them. |
3 | Binding | Although multiple operations of the same type (e.g., multiplications) can exist, hardware has a limited number of units for those operations. This step assigns each software computation to a specific hardware unit. |
4 | RTL Generation | This step generates a hardware description (such as Verilog) based on the results of the previous steps, resulting in a circuit design. |
The above four steps are implemented as the backend passes in a compiler framework such as LLVM, which is a Open-sourced collection of modular and reusable compiler and toolchain technologies. [3]
How to write SmartHLS C/C++
To write a C/C++ software compatible to Microchip FPGAs, we need to develop programs using Microchip’s Libero SoC Design Suite as it offers SmartHLS as part of its rich features. [4] In terms of HLS C++ software design, the designers can use HLS pragmas to implement HLS optimization techniques and guide the compiler during hardware generation. These pragmas are applied directly to the software constructs (such as functions, loops, arguments, or arrays) to indicate specific optimizations. For example, in the given code the applied pragma instructs SmartHLS compiler to pipeline the for loop:
#pragma HLS loop pipeline
for (i = 1; i < N; i++) {
a[i] = a[i-1] + 2
}
Please feel free to explore the HLS pragmas provided by SmartHLS during your free time:
https://microchiptech.github.io/fpga-hls-docs/2023.2/pragmas.html
Note that when using SmartHLS to compile software into hardware, the designer must define the top-level function for his/her program, so that SmartHLS can compile this function and all its dependent functions into hardware. The rest of the program, including parent functions (typically the main function), will serve as a software testbench. This can be realized by adding #pragma HLS function top
to the target function:
void hw_top(int a, int b) {
#pragma HLS function top
...
...
}
Summary
In this blog we discussed what is high-level synthesis, the benefits from HLS and how to write C/C++ code using Microchip’s SmartHLS. In the upcoming posts of this HLS blog series, I will dive deeper into the details of SmartHLS C/C++ design, accompanied by full design examples.
Reference
[1].https://runtimerec.com/high-level-synthesis/#elementor-toc__heading-anchor-1
[2].https://microchiptech.github.io/fpga-hls-docs/2023.2/userguide.html
[3].https://llvm.org