perf stat -d ./sample.out Output is:
Performance counter stats for './sample.out':
0.586266 task-clock (msec) # 0.007 CPUs utilized
2 context-switches # 0.003 M/sec
1 cpu-migrations # 0.002 M/sec
116 page-faults # 0.198 M/sec
7,35,790 cycles # 1.255 GHz [81.06%]
<not counted> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
<not supported> L1-dcache-loads:HG
<not counted> L1-dcache-load-misses:HG
<not counted> LLC-loads:HG
<not supported> LLC-load-misses:HG
0.088013919 seconds time elapsed
I read why will show up from . But I am getting for even basic counters like instructions, branches etc. Can anyone suggest how to make it work?
Interesting thing is:
sudo perf stat sleep 3
gives output:
Performance counter stats for 'sleep 3':
0.598484 task-clock (msec) # 0.000 CPUs utilized
2 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
181 page-faults # 0.302 M/sec
<not counted> cycles
<not counted> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
sudo perf stat -C 1 sleep 3
Performance counter stats for 'CPU(s) 1':
3002.640578 task-clock (msec) # 1.001 CPUs utilized [100.00%]
425 context-switches # 0.142 K/sec [100.00%]
9 cpu-migrations # 0.003 K/sec [100.00%]
5 page-faults # 0.002 K/sec
7,82,97,019 cycles # 0.026 GHz [33.32%]
9,38,21,585 stalled-cycles-frontend # 119.83% frontend cycles idle [33.32%]
<not supported> stalled-cycles-backend
3,09,81,643 instructions # 0.40 insns per cycle
# 3.03 stalled cycles per insn [33.32%]
70,15,390 branches # 2.336 M/sec [33.32%]
6,38,644 branch-misses # 9.10% of all branches [33.32%]
3.001075650 seconds time elapsed
Why is this unexpected working.??
Thank you
The typical problem of perf stat -d
for very short programs is not the statistical sampling, but multiplexing (percent in square brackets says [33%]
- this counter was counted only for around 33% of running time).
You ask your PMU to monitor too many events at once, and perf is unable to map all required counters on real hardware (PMU - performance monitoring unit of the CPU) in same time. Typical PMU may have something like 4 or 7 or 8 independent counters, but the number may be divided by two if you have some SMT technology enabled (for example, HT - HyperThreading).
When you ask perf to count so many counters (you have 6 supported HW events in your perf stat output), it will divide all them into smaller groups. Groups will be changed by kernel at some points in time, when perf_events got chance to change them, for example on task-clock tick (~3 ms).
You can split your run into several with smaller sets of events - any number of SW events and 2-4 HW events per run:
perf stat -e task-clock,page-faults,cycles,stalled-cycles-frontend
perf stat -e task-clock,page-faults,cycles,instructions
perf stat -e task-clock,page-faults,branches,branch-misses
perf stat -e task-clock,page-faults,L1-dcache-load-misses:HG,LLC-loads:HG
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With