Introduce several aspects and views of GPU and FPGA

Introduce GPU and FPGA from several aspects.
In terms of peak performance, GPU (10Tflops) is much higher than FPGA (Timing and other techniques). The circuit implementation is based on standard cell libraries and the circuit can be customized manually on the critical path, and even semiconductors can be used if necessary. The fab fine-tunes the process according to the design requirements, so many cores can run at a very high frequency at the same time. Relatively speaking, FPGA design resources are greatly restricted first. For example, if you want to add a few more cores to the GPU, you only need to increase the chip area, but once your FPGA model is selected, the upper limit of the logic resources is determined (floating point operations will be determined in the FPGA. Take up a lot of resources). Moreover, the logic unit in FPGA is based on SRAM-lookup table, and its performance will be much worse than the standard logic unit in GPU. Finally, FPGA routing resources are also limited (some wires must be routed far away). Unlike GPUs, ASIC flow can be routed at will, which will also limit performance.
In addition to chip performance, another advantage of GPU over FPGA is the memory interface. The bandwidth of the GPU memory interface (traditional GDDR, and more recently HBM and HBM2) is much better than the traditional DDR interface of FPGA, and it is well known that server-side machine learning algorithms require frequent memory access.
But in terms of flexibility, FPGAs are far better than GPUs. FPGA can program the hardware according to the specific application (for example, if there are a lot of addition operations in the application, a lot of logic resources can be used to implement the adder), but once the GPU is designed, it cannot be changed, and it cannot be adjusted according to the application. Hardware resources. At present, most machine learning is suitable for using the SIMD architecture (that is, a large amount of data can be processed in parallel with only one instruction), so GPU is very suitable. However, some applications are MISD (that is, a single data needs to be processed in parallel with many instructions. Microsoft cited an example of MISD for extracting features in parallel in ISCApaper in 2014). In this case, using FPGA as a MISD architecture will It has advantages over GPU. However, FPGA programming is not easy for programmers, so in order to allow machine learning programmers to use FPGAs conveniently, secondary development is often required on the basis of the compiler provided by FPGA companies. These are only large companies. do.
The machine learning accelerator implemented by FPGA can be optimized for specific applications in terms of architecture, so it has advantages over GPUs, but the operating speed of GPUs (>1GHz) has advantages over FPGAs (~200MHz).
Therefore, the average performance depends on whether the advantages of FPGA accelerator architecture can make up for the disadvantages of operating speed. If the architecture optimization on the FPGA can bring two to three orders of magnitude advantages over the GPU architecture, then the average performance of the FPGA will be better than that of the GPU. For example, Baidu's paper published on HotChips shows that the average performance of GPU is far better than FPGA on standard batchdataSIMDbench such as matrix calculations; however, it handles a small number of multiple processing requests on the server side (that is, data that is frequently requested but requested each time). The average performance will be better than GPU.
In terms of power consumption, although the power consumption of the GPU (200W) is much greater than the power consumption of the FPGA (10W), if you want to compare the power consumption, you should compare the power consumption when the execution efficiency is the same. If the FPGA architecture optimization can be so good that the average performance of an FPGA can be close to that of a GPU, then the total power consumption of the FPGA solution is much smaller than that of the GPU, and the heat dissipation problem can be greatly reduced. Conversely, if twenty FPGAs are needed to achieve the average performance of a GPU, then FPGAs have no advantage in terms of power consumption.
The comparison of energy efficiency ratio is similar. Energy efficiency refers to the energy consumed to complete the program execution, and the energy consumption is equal to the power consumption multiplied by the program execution time. Although the power consumption of the GPU is much greater than that of the FPGA, if the FPGA takes dozens of times longer to execute the same program than the GPU, the FPGA has no advantage in terms of energy efficiency; on the contrary, if the hardware architecture implemented on the FPGA is optimized Suitable for specific machine learning applications, the time required to execute the algorithm is only a few times that of the GPU or even close to the GPU, then the energy efficiency ratio of the FPGA will be stronger than that of the GPU.

Delivery from Malaysia Warehouse
Guangzhou Fengjiu New Energy Technology Co.,Ltd , https://www.flashfishbatteries.com