I am confused about the difference between perf record
and perf stat
when it comes to counting events like page-faults, cache-misses and anything else from perf list
. I have 2 questions below the answer to "Question 1" might also help answer "Question 2" but I wrote them out explicitly in the case that it doesn't.
Question 1:
It is my understanding that perf stat
gets a "summary" of counts but when used with the -I option gets the counts at the specified millisecond interval. With this option does it sum up the counts over the interval or get the average over the interval, or something else entirely? I assume it is summed up. The perf wiki states it is aggregated but I guess that could mean either.
Question 2:
Why doesn't perf stat -e <event1> -I 1000 sleep 5
give about the same counts as if I summed up the counts over each second for the following command perf record -e <event1> -F 1000 sleep 5
?
For example if I use "page-faults" as the event for event1 I get the following outputs I have listed below under each command. (I am assuming the period field is the counts for the event in perf record
's perf.data file)
PERF STAT
perf stat -e page-faults -I 1000 sleep 5
# time counts unit events
1.000252928 54 page-faults
2.000498389 <not counted> page-faults
3.000569957 <not counted> page-faults
4.000659987 <not counted> page-faults
5.000837864 2 page-faults
PERF RECORD
perf record -e page-faults -F 1000 sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (6 samples) ]
perf script -F period
1
1
1
5
38
164
I expected that if I summed up the counts from perf stat
I would get the same as the sum from perf record
. If I use the -c option with perf record
and give an argument of 1 I do get a close match. Is this just a coincidence because of the relatively low number of page faults?
References I have used so far:
Thanks in advance for any and all insight you can provide.
First of all, your test case of using sleep
and page-faults
is not an ideal test case. There should be no page fault events during the sleep duration, you you can't really expect anything interesting. For the sake of easier reasoning I suggest to use the ref-cycles
(hardware) event and a busy workload such as awk 'BEGIN { while(1){} }'
.
Question 1: It is my understanding that perf stat gets a "summary" of counts but when used with the -I option gets the counts at the specified millisecond interval. With this option does it sum up the counts over the interval or get the average over the interval, or something else entirely? I assume it is summed up.
Yes. The values are just summed up. You can confirm that by testing:
$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
1.000105072 2,563,666,664 ref-cycles
2.000267991 2,577,462,550 ref-cycles
3.000415395 2,577,211,936 ref-cycles
4.000543311 2,577,240,458 ref-cycles
5.000702131 2,577,525,002 ref-cycles
6.000857663 2,577,156,088 ref-cycles
[ ... snip ... ]
[ Note that it may not be as nicely consistent on all systems due dynamic frequency scaling ]
$ perf stat -e ref-cycles -I 3000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
3.000107921 7,736,108,718 ref-cycles
6.000265186 7,732,065,900 ref-cycles
9.000372029 7,728,302,192 ref-cycles
Question 2: Why doesn't
perf stat -e <event1> -I 1000 sleep 5
give about the same counts as if I summed up the counts over each second for the following commandperf record -e <event1> -F 1000 sleep 5
?
perf stat -I
is in milliseconds, whereas perf record -F
is in HZ (1/s), so the corresponding command to perf stat -I 1000
is perf record -F 1
. In fact with our more stable event/workload, this looks better:
$ perf stat -e ref-cycles -I 1000 timeout 10s awk 'BEGIN { while(1){} }'
# time counts unit events
1.000089518 2,578,694,534 ref-cycles
2.000203872 2,579,866,250 ref-cycles
3.000294300 2,579,857,852 ref-cycles
4.000390273 2,579,964,842 ref-cycles
5.000488375 2,577,955,536 ref-cycles
6.000587028 2,577,176,316 ref-cycles
7.000688250 2,577,334,786 ref-cycles
8.000785388 2,577,581,500 ref-cycles
9.000876466 2,577,511,326 ref-cycles
10.000977965 2,577,344,692 ref-cycles
10.001195845 466,674 ref-cycles
$ perf record -e ref-cycles -F 1 timeout 10s awk 'BEGIN { while(1){} }'
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.008 MB perf.data (17 samples) ]
$ perf script -F time,period
3369070.273722: 1
3369070.273755: 1
3369070.273911: 3757
3369070.273916: 3015133
3369070.274486: 1
3369070.274556: 1
3369070.274657: 1778
3369070.274662: 2196921
3369070.275523: 47192985748
3369072.663696: 2578692405
3369073.663547: 2579122382
3369074.663609: 2580015300
3369075.664085: 2579873741
3369076.664433: 2578638211
3369077.664379: 2578378119
3369078.664175: 2578166440
3369079.663896: 2579238122
So you see, eventually the results are stable also for perf record -F
. Unfortunately the documentation of perf record
is very poor. You can learn what the settings -c
and -F
mean by looking at the documentation of the underlying system call man perf_event_open
:
sample_period
,sample_freq
A "sampling" event is one that generates an overflow notification every N events, where N is given bysample_period
. A sampling event hassample_period
> 0. When an overflow occurs, requested data is recorded in the mmap buffer. Thesample_type
field controls what data is recorded on each overflow.
sample_freq
can be used if you wish to use frequency rather than period. In this case, you set the freq flag. The kernel will adjust the sampling period to try and achieve the desired rate. The rate of adjustment is a timer tick.
So while perf stat
uses an internal timer to read the value of the counter every -i
milliseconds, perf record
sets an event overflow counter to take a sample every -c
events. That means it takes a sample every N
events (e.g. every N
page-fault
or cycles
). With -F
, it it tries to regulate this overflow value to achieve the desired frequency. It tries different values and tunes it up/down accordingly. This eventually works for counters with a stable rate, but will get erratic results for dynamic events.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With