I use "perf stat
" command to do a statistic of some events:
[root@root test]# perf stat -a -e "r81d0","r82d0" -v ./a
r81d0: 71800964 1269047979 1269006431
r82d0: 26655201 1284214869 1284214869
Performance counter stats for './a':
71,800,964 r81d0 [100.00%]
26,655,201 r82d0
0.036892349 seconds time elapsed
(1) I know 71800964
is the count of "r81d0
", but what is the meaning of 1269047979
and 1269006431
?
(2) What is the meaning of "[100.00%]
"?
I have tried to "perf stat --help
", but can't get the explanations of these values.
perf stat: obtain event counts. perf record: record events for later reporting. perf report: break down events by process, function, etc. perf annotate: annotate assembly or source code with event counts. perf top: see live event count.
perf report is able to auto-detect whether a perf. data file contains branch stacks and it will automatically switch to the branch view mode, unless --no-branch-stack is used. --branch-history Add the addresses of sampled taken branches to the callstack. This allows to examine the path the program took to each sample.
The perf command is used as a primary interface to the Linux kernel performance monitoring capabilities and can record CPU performance counters and trace points.
Perf is a facility comprised of kernel infrastructure for gathering various events and userspace tool to get gathered data from the kernel and analyze it. It is like a gprof, but it is non-invasive, low-overhead and profile the whole stack, including your app, libraries, system calls AND kernel with CPU!
[root@root test]# perf stat -a -e "r81d0","r82d0" -v ./a
r81d0: 71800964 1269047979 1269006431
r82d0: 26655201 1284214869 1284214869
This is output from verbose option, as defined in tools/perf/builtin-stat.c file of the kernel:
391 /*
392 * Read out the results of a single counter:
393 * aggregate counts across CPUs in system-wide mode
394 */
395 static int read_counter_aggr(struct perf_evsel *counter)
408 if (verbose) {
409 fprintf(output, "%s: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
410 perf_evsel__name(counter), count[0], count[1], count[2]);
411 }
count is from struct perf_counts_values
, defined as http://lxr.free-electrons.com/source/tools/perf/util/evsel.h?v=3.18#L12 with array of three uint64_t values, named as val
, ena
, run
Three count
values are filled by kernel and are read from fd, opened with perf_event_open()
syscall. There is related part of man perf_event_open
: http://man7.org/linux/man-pages/man2/perf_event_open.2.html
read_format
This field specifies the format of the data returned by
read(2) on a perf_event_open() file descriptor.
PERF_FORMAT_TOTAL_TIME_ENABLED
Adds the 64-bit time_enabled field. This can be used
to calculate estimated totals if the PMU is
overcommitted and multiplexing is happening.
PERF_FORMAT_TOTAL_TIME_RUNNING
Adds the 64-bit time_running field. This can be used
to calculate estimated totals if the PMU is
overcommitted and multiplexing is happening. ...
perf stat
enables all TIME flags if scale
is true -
298 if (scale)
299 attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
300 PERF_FORMAT_TOTAL_TIME_RUNNING;
So, first counter is raw event count; second is proportional to the time when this event was collected and last is proportional to the total running time. This is needed when you asks perf
to stat on high number of events, which can't be monitored at once (hardware usually has up to 5-7 performance monitors). In such case in-kernel perf will run subsets of required event for some parts of execution; and subsets will be changed several times. With ena
and run
counts, perf can estimate how inaccurate event monitoring was in case of multiplexing.
Performance counter stats for './a':
71,800,964 r81d0 [100.00%]
26,655,201 r82d0
And in your case two events were mapped in the same time without needs of multiplexing; your ena
and run
counters are close. And print_aggr
function prints their ratio:
1137 val += counter->counts->cpu[cpu].val;
1138 ena += counter->counts->cpu[cpu].ena;
1139 run += counter->counts->cpu[cpu].run;
Print_noise will output in case of -r N
option to rerun task N times to get statistics (man: --repeat=<n>
repeat command and print average + stddev (max: 100))
1176 print_noise(counter, 1.0);
And there is [100.00%]
printer:
1178 if (run != ena)
1179 fprintf(output, " (%.2f%%)",
1180 100.0 * run / ena);
It will not print 100% if both run and ena times are equal, and your r82d0 event have equal. Your r81d0 event have slightly different run and ena, so 100% is printed in one line.
I know that perf stat -d
can be inaccurate, because it asks for too much events; and there will be not 100% mulitplexing, but something like 53%. It means "this event was counted only in 53% of the program runtime in some random parts of it"; and if you program have several separated computing stages, events with low run/ena ratio will be less accurate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With