Can perf-stat results be generated from a perf.data file?

Tags:

When I want to generate performance reports using perf-stat and perf-report from the Linux tool suite perf, I run:

$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data

And:

$ perf stat myCmd

But that means I run 'myCmd' a second time, which takes several minutes. Instead, I was hoping for:

$ perf stat -i my.perf.data

But unlike most of the tools in the perf suite, I don't see a -i option for perf-stat. Is there another tool for this, or a way to get perf-report to generate similar output to perf-stat?

246

asked Apr 25 '12 17:04

2 Answers

I dug into the source on kernel.org and it looks like there's no way to get perf stat to parse perf.data

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs/heads/linux-2.6.33.y

If you look at line 245 you'll see the function "run_perf_stat" and the lines around 308-320 seem to be what actually do the recording and collating.

I didn't dig into this hard enough to determine if it's possible to enable the kind of functionality that you desire.

It does not look like perf report has a lot of additional formatting capabilities to it. You can check further if you like here:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs/heads/linux-2.6.33.y

160

answered Sep 22 '22 02:09

perf stat can't be used to parse perf.data file, but you can ask perf report to print the header with event count estimation with perf report --header |egrep Event\|Samples. Only events, recorded into perf.data file will be estimated.

The perf stat uses hardware performance monitoring unit in counting mode, and perf record/perf report with perf.data file uses the same hardware unit configured in periodic overflow mode (sampling profiling). In both modes hardware performance counters are set up with their control register into some set of performance events (for example cpu cycles or instructions executed), and counters will be incremented on every event by hardware.

In counting mode perf stat uses counters initially set at zero at program start, they are incremented by hardware and perf will read final counter value at program exit (actually counting will be split in several segments by OS with similar final result - single value for full program run).

In profiling mode perf record will set every hardware counter to some negative value, for example -200000 and overflow handler will be registered and enabled (actual value will be autotuned into some frequency by OS kernel). Every 200000 events counted the counter will overflow from -1 into zero and generate an overflow interrupt. perf_events interrupt handler will record the "sample" (current time, pid, instruction pointer, optionally callstack in -g mode) into ring buffer (mmaped by perf), data from which will be saved into perf.data. This handler will also reset the counter into -200000 again. So, after long enough run there will be many samples to be stored in perf.data. This sample set can be used to generate statistical profile of program (which parts of program did run more often). But also we can get some estimation of total events if every sample was generated every 200000 events. Due to value autotuning by kernel (it tries to generate samples at 4000 Hz) estimation will be more difficult, use something like -c 1000000 to disable autotuning of sample period.

What does perf stat show in default mode? For some x86_64 cpu I have: running time of the program (task-clock and elapsed), 3 software events (context switch, cpu migration, page fault), 4 hardware counters: cycles, instructions, branches, branch-misses:

$ echo '3^123456%3' | perf stat bc
0
 Performance counter stats for 'bc':
        325.604672      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               181      page-faults               #    0.556 K/sec                  
       828,234,675      cycles                    #    2.544 GHz                    
     1,840,146,399      instructions              #    2.22  insn per cycle         
       348,965,282      branches                  # 1071.745 M/sec                  
        15,385,371      branch-misses             #    4.41% of all branches        
       0.326152702 seconds time elapsed

What does record perf record in default mode? When hardware events are available, it is cycles event. In single wake up (ring buffer overflow) perf did save 1246 samples into perf.data

$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]

With perf report --header|less, perf script and perf script -D you can take a look into the perf.data content:

$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l 
1293

There are some timestamps inside perf.data and some additional events for program start and exit (perf script -D |egrep exec\|EXIT), but there is no enough information in default perf.data to fully reconstruct perf stat output. Running time is recorded only as timestamps of start and exit, and of every event sample, software events are not recorded, and single hardware event was used (cycles; but no instructions, branches, branch-misses). Approximation of used hardware counter can be done, but it is not precise (real cycles was around 820-825 mln):

$ perf report --header |grep Event
# Event count (approx.): 836622729

With non-default recording of perf.data more events can be estimated by perf report:

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047

Fixed period can be used, but kernel may limit some events if the value of -c option is too low (samples should not be generated more often than 1000-4000 times per second):

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses -c 1000000 bc
$ perf report --header |egrep Event\|Samples
[ perf record: Captured and wrote 0.118 MB perf.data (3029 samples) ]
# Samples: 823  of event 'cycles'
# Event count (approx.): 823000000
# Samples: 1K of event 'instructions'
# Event count (approx.): 1842000000
# Samples: 349  of event 'branches'
# Event count (approx.): 349000000
# Samples: 15  of event 'branch-misses'
# Event count (approx.): 15000000

answered Sep 21 '22 02:09

osgx

Related questions
                            
                                What determines binary compatibility of shared libraries on Linux?
                            
                                i2c driver boot up - raspbian
                            
                                How to solve "Operation not permitted: '/var/lib/pgadmin'" error in laradock at Windows Subsystem for Linux?
                            
                                How to build .sqlproj requiring SSDT in a linux docker container?
                            
                                Why bit-shift in two steps?
                            
                                Can I use boost on uclibc linux?
                            
                                Tips for Setting Up Complex CTAGS Search Paths
                            
                                Windows Mobile Emulator For Linux
                            
                                How to take snapshot in linux - programmatically C++
                            
                                Relation between stack limit and threads
                            
                                Confused about X Window and GNOME/KDE
                            
                                How to bundle an application for Linux
                            
                                How to restrict access to symbols in shared object?
                            
                                Comparing audio recordings
                            
                                Minimum time a thread can pause in Linux
                            
                                Recreate dead threads after a fork
                            
                                What's the purpose of the socket option SO_SNDLOWAT
                            
                                Accessing Linux /dev/USB as standard files to communicate with USB device
                            
                                How can I listen for and report server (SSH) connections via a Python script?
                            
                                Difference b/w llvm-ld and llvm-link

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can perf-stat results be generated from a perf.data file?

Tags:

performance

linux

profiling

performancecounter

perf

garious

People also ask

2 Answers

Mike Sandford

osgx

Recent Activity

Donate For Us