When I want to generate performance reports using perf-stat and perf-report from the Linux tool suite perf, I run:
$ perf record -o my.perf.data myCmd
$ perf report -i my.perf.data
And:
$ perf stat myCmd
But that means I run 'myCmd' a second time, which takes several minutes. Instead, I was hoping for:
$ perf stat -i my.perf.data
But unlike most of the tools in the perf suite, I don't see a -i option for perf-stat. Is there another tool for this, or a way to get perf-report to generate similar output to perf-stat?
There is builtin perf. data parser and printer in perf tool of linux tools with subcommand "script". perf-script - Read perf. data (created by perf record) and display trace output This command reads the input file and displays the trace recorded.
perf report is able to auto-detect whether a perf. data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used. --branch-history Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample.
Perf consists of kernel code and an userspace tool. The tool records the data to an file which can be analyzed later. Understanding this data format is necessary for individual software performance analysis. This report provides information about the data structures used to read the data file.
The perf tool can be used to collect profiles on per-thread, per-process and per-cpu basis. There are several commands associated with sampling: record, report, annotate. You must first collect the samples using perf record. This generates an output file called perf.
I dug into the source on kernel.org and it looks like there's no way to get perf stat to parse perf.data
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-stat.c;h=c70d72003557f17f29345b0f219dc5ca9f572d75;hb=refs/heads/linux-2.6.33.y
If you look at line 245 you'll see the function "run_perf_stat" and the lines around 308-320 seem to be what actually do the recording and collating.
I didn't dig into this hard enough to determine if it's possible to enable the kind of functionality that you desire.
It does not look like perf report has a lot of additional formatting capabilities to it. You can check further if you like here:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=tools/perf/builtin-report.c;h=860f1eeeea7dbf8e43779308eaaffb1dbcf79d10;hb=refs/heads/linux-2.6.33.y
perf stat
can't be used to parse perf.data
file, but you can ask perf report
to print the header with event count estimation with perf report --header |egrep Event\|Samples
. Only events, recorded into perf.data
file will be estimated.
The perf stat
uses hardware performance monitoring unit in counting mode, and perf record
/perf report
with perf.data file uses the same hardware unit configured in periodic overflow mode (sampling profiling). In both modes hardware performance counters are set up with their control register into some set of performance events (for example cpu cycles or instructions executed), and counters will be incremented on every event by hardware.
In counting mode perf stat
uses counters initially set at zero at program start, they are incremented by hardware and perf will read final counter value at program exit (actually counting will be split in several segments by OS with similar final result - single value for full program run).
In profiling mode perf record
will set every hardware counter to some negative value, for example -200000
and overflow handler will be registered and enabled (actual value will be autotuned into some frequency by OS kernel). Every 200000 events counted the counter will overflow from -1 into zero and generate an overflow interrupt. perf_events
interrupt handler will record the "sample" (current time, pid, instruction pointer, optionally callstack in -g
mode) into ring buffer (mmaped by perf), data from which will be saved into perf.data
. This handler will also reset the counter into -200000
again. So, after long enough run there will be many samples to be stored in perf.data
. This sample set can be used to generate statistical profile of program (which parts of program did run more often). But also we can get some estimation of total events if every sample was generated every 200000 events. Due to value autotuning by kernel (it tries to generate samples at 4000 Hz) estimation will be more difficult, use something like -c 1000000
to disable autotuning of sample period.
What does perf stat
show in default mode? For some x86_64 cpu I have: running time of the program (task-clock and elapsed), 3 software events (context switch, cpu migration, page fault), 4 hardware counters: cycles, instructions, branches, branch-misses:
$ echo '3^123456%3' | perf stat bc
0
Performance counter stats for 'bc':
325.604672 task-clock (msec) # 0.998 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
181 page-faults # 0.556 K/sec
828,234,675 cycles # 2.544 GHz
1,840,146,399 instructions # 2.22 insn per cycle
348,965,282 branches # 1071.745 M/sec
15,385,371 branch-misses # 4.41% of all branches
0.326152702 seconds time elapsed
What does record perf record
in default mode? When hardware events are available, it is cycles event. In single wake up (ring buffer overflow) perf did save 1246 samples into perf.data
$ echo '3^123456%3' | perf record bc
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.049 MB perf.data (1293 samples) ]
With perf report --header|less
, perf script
and perf script -D
you can take a look into the perf.data content:
$ perf report --header |grep event
# event : name = cycles:uppp, , size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD ...
# Samples: 1K of event 'cycles:uppp'
$ perf script 2>/dev/null |grep cycles|wc -l
1293
There are some timestamps inside perf.data and some additional events for program start and exit (perf script -D |egrep exec\|EXIT
), but there is no enough information in default perf.data
to fully reconstruct perf stat
output. Running time is recorded only as timestamps of start and exit, and of every event sample, software events are not recorded, and single hardware event was used (cycles; but no instructions, branches, branch-misses). Approximation of used hardware counter can be done, but it is not precise (real cycles was around 820-825 mln):
$ perf report --header |grep Event
# Event count (approx.): 836622729
With non-default recording of perf.data
more events can be estimated by perf report
:
$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses bc
[ perf record: Captured and wrote 0.238 MB perf.data (5164 samples) ]
$ perf report --header |egrep Event\|Samples
# Samples: 1K of event 'cycles'
# Event count (approx.): 834809036
# Samples: 1K of event 'instructions'
# Event count (approx.): 1834083643
# Samples: 1K of event 'branches'
# Event count (approx.): 347750459
# Samples: 1K of event 'branch-misses'
# Event count (approx.): 15382047
Fixed period can be used, but kernel may limit some events if the value of -c
option is too low (samples should not be generated more often than 1000-4000 times per second):
$ echo '3^123456%3' | perf record -e cycles,instructions,branches,branch-misses -c 1000000 bc
$ perf report --header |egrep Event\|Samples
[ perf record: Captured and wrote 0.118 MB perf.data (3029 samples) ]
# Samples: 823 of event 'cycles'
# Event count (approx.): 823000000
# Samples: 1K of event 'instructions'
# Event count (approx.): 1842000000
# Samples: 349 of event 'branches'
# Event count (approx.): 349000000
# Samples: 15 of event 'branch-misses'
# Event count (approx.): 15000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With