How to use proging to profiling software (1)

one. Xilinx Zynq-7000 brings new design ideas

In the past, our boards often had CPUs and multiple FPGAs. The CPU was configured and managed by the CPU. The FPGA completed the hardware acceleration of the specific algorithm, limited by the communication bandwidth and delay between the CPU and the FPGA. CPU and FPGA. Most of the interfaces between them are used for configuration and management, and cannot transfer large amounts of data.

Xilinx's Zynq-7000 series chip solves this problem well. It contains a hardened CPU core and common peripherals (DRAM controller, Gigabit Ethernet, USB 2.0 OTG, SD card controller, FLASH controller, UART, CAN, SPI, I2C, etc.), this part is Called Processing System (PS), it can be completely independent of the FPGA; Zynq-7000 chip also has FPGA resources of different capacities, called Programmable Logic (PL), which can support logic design with different complexity. Most importantly, there are more than 3,000 interconnect signals between PS and PL, including 9 AXI channels, which can provide about 100Gb/s communication bandwidth, while there are DMA, Interrupt and EMIO between PS and PL. A variety of resources. This allows data to be flexibly and efficiently migrated between PS and PL. From a system design perspective, tasks can be flexibly split between software and hardware to achieve highly optimized system design. This also provides a new idea and process for the development method of the embedded system: firstly, the software can be quickly and flexibly programmed, and the prototype of the system can be quickly realized by software; then the software is profiling to find out the greatest impact on the system performance. Code, this part of the code is accelerated by hardware with FPGA to realize highly optimized embedded system; Xilinx also provides HLS (High Level Synthesis) tool to quickly and easily convert software code into RTL code, helping developers to quickly realize based on The hardware accelerator of the FPGA.

An important part of this process is how to find the part of the software that has the most impact on performance. For simple applications, we can easily judge. For example, for spectrum analysis, FFT algorithm is the most important algorithm that needs to be optimized. But in many cases, the software is very complicated, there are a lot of complex function calls, it is difficult to find the part of the code that has the greatest impact on performance through static observation and analysis. Then you need to use the profiling tool to collect in the dynamic operation of the software. Data, through the statistical method to find the core code.

two. Profiling object

There are a lot of profiling tools under Linux, each with its own advantages and disadvantages. Here we focus on how to profiling software using gprof.

Many articles that introduce profiling tools are developers who write a simple source file with simple function calls.
In order to better demonstrate the effect of profiling, we did not use this method here, but adopted a relatively complicated package libjpeg.

Libjpeg is a library written entirely in C, which includes the implementation of widely used JPEG decoding, JPEG encoding and other JPEG functions. This library is maintained by the Independent JPEG Working Group. After the compilation is completed, in addition to the corresponding .a and .so library files, the following utility programs are generated:
Cjpeg and djpeg: For JPEG compression and decompression, can be converted with some other format graphics files.
Rdjpgcom and wrjpgcom: used to insert and extract text information from JFIF files.
Jpegtran: A tool for lossless conversion between different JPEG formats.

Here cjpeg and djpeg are very good profiling objects, with a certain complexity, but not complicated to be daunting. JPEG image files can be flexibly selected on the Internet. The basic principle is that it is large enough to have a long run time to collect profiling data, while having enough detail to allow the software to run fully. There are a lot of big pictures on the website, I chose a 2880x1800 JPEG file.
Libjpeg can be found on it. The version used here is release 9 released by 13-Jan-2013. The downloaded source file is jpegsrc.v9.tar.gz

Introduction to GNU profiler (gprof)

GNU profiler (gprof) is an integral part of GNU BinuTIls ( https://sourceware.org/binuTIls/ ). Detailed documentation can be found at https://sourceware.org/binuTIls/docs/gprof/, by default Linux This tool is included in the system, but if you plan to use it on the embedded development board, you need to cross-compile GNU BinuTIls.

Gprof features:
1. Generate a "flat profile" that includes the number of calls per function, the processor time consumed by each function,
2. Generate a "Call graph", including the calling relationship of the function, and how much time each function call takes.
3. Generate "annotated source code", which is a copy of the source code of the program, marked with the number of executions of each line of code in the program.

The principle of Gprof:
By using the -pg option when compiling and linking, gcc adds a function called mcount (which may also be "_mcount" or "__mcount", depending on the compiler or operating system) in every function of the application. So every function in the application will call mcount, and mcount will save a function call graph in memory, record the address of the child function and parent function found through the function call stack, and all the call time associated with the function, call Information such as the number of times.

Gprof basic usage process
1. Add the -pg option to compile and link. Generally can be added to CFLAGS and LDFLAGS in the Makefile.
2. Execute the compiled binary. Execution parameters and methods are the same as before.
3. The process ends normally. At this point, the information in the memory will be written to the gmon.out file in the program run directory.
4. Analyze the gmon.out file with the gprof tool.

Gprof parameter description Ÿ -b No longer outputs a detailed description of each field in the statistics chart.
Ÿ -p only outputs the call graph of the function (the part of the Call graph).
Ÿ -q only outputs a list of time consumptions for the function.
Ÿ -e Name Does not output a call graph of the function Name and its subfunctions (unless they have other parent functions that are not restricted). Multiple -e flags can be given. A -e flag can only specify one function.
Ÿ -E Name Does not output a call graph of the function Name and its subfunctions. This flag is similar to the -e flag, but it excludes the time spent by the function Name and its subfunctions in the calculation of total time and percentage time.
Ÿ -f Name Outputs a call graph of the function Name and its subfunctions. Multiple -f flags can be specified. A -f flag can only specify one function.
Ÿ -F Name Outputs a call graph of the function Name and its subfunctions, which is similar to the -f flag, but it uses only the time of the printed routine in the total time and percentage time calculations. Multiple -F flags can be specified. A -F flag can only specify one function. The -F flag overrides the -E flag.

General usage:
Gprof -b ELF_file_name gmon.out >report.txt

Description of the columns in the flat profile table in the Gprof report:
%time: This function takes time to account for the percentage of time in the program. All additions should be 100%.
Cumulative seconds: The cumulative execution time of the program, including the execution time of all functions above the row in the table.
Self Seconds: The total execution time of the function itself. The table sorts all rows in descending order according to the values ​​in this column.
Calls: The number of times the function was called, or null if it cannot be determined.
Self ms/call: The average execution time of the function.
Total ms/call: The average execution time of the function, including its internal calls.
Name: The name of the function. After sorting according to self seconds and calls, sort the letters according to this column.

Description of the columns in the Call Graph form in the Gprof report:
Index: index value
%time: function consumes time as a percentage of all time
Self: function itself execution time
Children: Time spent executing subfunctions
Called: number of times called
Name: function name

Advantages of Gprof:
1. Easy to use. Just need to add -pg option to compile and link. Gprof is useful for CPU-intensive applications where the code is mostly user space, meaning that applications that run most of the time in kernel space or that run slowly due to external factors such as the operating system's I/O subsystem overload Not big.
2. The components of GNU Binutils, basically any Linux. You can copy the generated gmon.out to the host for analysis, saving some of the cross-compiled workload.

The disadvantages of Gprof:
1. Gprof only monitors functions that have the -pg option when compiling and linking. Functions that work in kernel mode and third-party library functions that do not have -pg compilation cannot be monitored by gprof. Therefore, Gprof is more suitable for applications that perform most of the time in user mode. Before using Gprof, it is best to use the time command under Linux to confirm the actual running time of the application, the running time of the user space, and the running time of the kernel space to determine whether gprof is suitable. Oprofile can solve this problem.

2. Gprof cannot monitor shared libraries, ie .so files.
There is a detailed analysis of this. Sprof can be used for such files, but it is not easy to use. The workaround is to statically link the library to the application, which will increase the code size of the application.

3. Gprof does not support multi-threaded applications. Only multi-threaded can collect main thread performance data. The reason is that only the main thread can respond to the ITIMER_PROF signal used by gprof in multithreading. There is an easy way to solve this problem:

4. gprof can only generate a report (gmon.out) after the program exits normally or exits through the system call exit(). The reason is that gprof generates a result message by registering a function in atexit(). Any abnormal exit will not execute the action of atexit(), so the gmon.out file will not be generated.

5. The function execution time is an estimate. The function execution time is estimated by sampling. In the case that the execution time is long enough, this is not a big problem, and the general estimated value is not much different from the actual value.

Interactive Smart Board

Interactive Smart Board,Smart Boards,Smart Board Projector,Smart Electric Board

APIO ELECTRONIC CO.,LTD , https://www.displayapio.com