Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can perf-stat results be generated from a perf.data file?

When I want to generate performance reports using perf-stat and perf-report from the Linux tool suite perf, I run:

$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data

And:

$ perf stat myCmd

But that means I run 'myCmd' a second time, which takes several minutes. Instead, I was hoping for:

$ perf stat -i my.perf.data

But unlike most of the tools in the perf suite, I don't see a -i option for perf-stat. Is there another tool for this, or a way to get perf-report to generate similar output to perf-stat?

like image 246
garious Avatar asked Apr 25 '12 17:04

garious


People also ask

How do I read a perf data file?

There is builtin perf. data parser and printer in perf tool of linux tools with subcommand "script". perf-script - Read perf. data (created by perf record) and display trace output This command reads the input file and displays the trace recorded.

What is perf report?

perf report is able to auto-detect whether a perf. data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used. --branch-history Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample.

What is a perf file?

Perf consists of kernel code and an userspace tool. The tool records the data to an file which can be analyzed later. Understanding this data format is necessary for individual software performance analysis. This report provides information about the data structures used to read the data file.

What can perf do?

The perf tool can be used to collect profiles on per-thread, per-process and per-cpu basis. There are several commands associated with sampling: record, report, annotate. You must first collect the samples using perf record. This generates an output file called perf.


2 Answers

I dug into the source on kernel.org and it looks like there's no way to get perf stat to parse perf.data

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs/heads/linux-2.6.33.y

If you look at line 245 you'll see the function "run_perf_stat" and the lines around 308-320 seem to be what actually do the recording and collating.

I didn't dig into this hard enough to determine if it's possible to enable the kind of functionality that you desire.

It does not look like perf report has a lot of additional formatting capabilities to it. You can check further if you like here:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs/heads/linux-2.6.33.y

like image 160
Mike Sandford Avatar answered Sep 22 '22 02:09

Mike Sandford


perf stat can't be used to parse perf.data file, but you can ask perf report to print the header with event count estimation with perf report --header |egrep Event\|Samples. Only events, recorded into perf.data file will be estimated.

The perf stat uses hardware performance monitoring unit in counting mode, and perf record/perf report with perf.data file uses the same hardware unit configured in periodic overflow mode (sampling profiling). In both modes hardware performance counters are set up with their control register into some set of performance events (for example cpu cycles or instructions executed), and counters will be incremented on every event by hardware.

In counting mode perf stat uses counters initially set at zero at program start, they are incremented by hardware and perf will read final counter value at program exit (actually counting will be split in several segments by OS with similar final result - single value for full program run).

In profiling mode perf record will set every hardware counter to some negative value, for example -200000 and overflow handler will be registered and enabled (actual value will be autotuned into some frequency by OS kernel). Every 200000 events counted the counter will overflow from -1 into zero and generate an overflow interrupt. perf_events interrupt handler will record the "sample" (current time, pid, instruction pointer, optionally callstack in -g mode) into ring buffer (mmaped by perf), data from which will be saved into perf.data. This handler will also reset the counter into -200000 again. So, after long enough run there will be many samples to be stored in perf.data. This sample set can be used to generate statistical profile of program (which parts of program did run more often). But also we can get some estimation of total events if every sample was generated every 200000 events. Due to value autotuning by kernel (it tries to generate samples at 4000 Hz) estimation will be more difficult, use something like -c 1000000 to disable autotuning of sample period.

What does perf stat show in default mode? For some x86_64 cpu I have: running time of the program (task-clock and elapsed), 3 software events (context switch, cpu migration, page fault), 4 hardware counters: cycles, instructions, branches, branch-misses:

$ echo '3^123456%3' | perf stat bc
0
 Performance counter stats for 'bc':
        325.604672      task-clock (msec)         #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               181      page-faults               #    0.556 K/sec                  
       828,234,675      cycles                    #    2.544 GHz                    
     1,840,146,399      instructions              #    2.22  insn per cycle         
       348,965,282      branches                  # 1071.745 M/sec                  
        15,385,371      branch-misses             #    4.41% of all branches        
       0.326152702 seconds time elapsed

What does record perf record in default mode? When hardware events are available, it is cycles event. In single wake up (ring buffer overflow) perf did save 1246 samples into perf.data

$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]

With perf report --header|less, perf script and perf script -D you can take a look into the perf.data content:

$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l 
1293

There are some timestamps inside perf.data and some additional events for program start and exit (perf script -D |egrep exec\|EXIT), but there is no enough information in default perf.data to fully reconstruct perf stat output. Running time is recorded only as timestamps of start and exit, and of every event sample, software events are not recorded, and single hardware event was used (cycles; but no instructions, branches, branch-misses). Approximation of used hardware counter can be done, but it is not precise (real cycles was around 820-825 mln):

$ perf report --header |grep Event
# Event count (approx.): 836622729

With non-default recording of perf.data more events can be estimated by perf report:

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047

Fixed period can be used, but kernel may limit some events if the value of -c option is too low (samples should not be generated more often than 1000-4000 times per second):

$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses -c 1000000 bc
$ perf report --header |egrep Event\|Samples
[ perf record: Captured and wrote 0.118 MB perf.data (3029 samples) ]
# Samples: 823  of event 'cycles'
# Event count (approx.): 823000000
# Samples: 1K of event 'instructions'
# Event count (approx.): 1842000000
# Samples: 349  of event 'branches'
# Event count (approx.): 349000000
# Samples: 15  of event 'branch-misses'
# Event count (approx.): 15000000
like image 31
osgx Avatar answered Sep 21 '22 02:09

osgx