Running perf stat ls
shows this:
Performance counter stats for 'ls': 1.388670 task-clock # 0.067 CPUs utilized 2 context-switches # 0.001 M/sec 0 cpu-migrations # 0.000 K/sec 266 page-faults # 0.192 M/sec 3515391 cycles # 2.531 GHz 2096636 stalled-cycles-frontend # 59.64% frontend cycles idle <not supported> stalled-cycles-backend 2927468 instructions # 0.83 insns per cycle # 0.72 stalled cycles per insn 615636 branches # 443.328 M/sec 22172 branch-misses # 3.60% of all branches 0.020657192 seconds time elapsed
Why is stalled-cycles-backend shown as "not supported"? What kind of CPU, hardware, kernel or user-space software do I need to see this value?
Currently tried this on RHEL with Linux 3.12 for x86_64, with matching perf
version, on different Intel Core i5 and i7 systems (Ivy Bridge type). None of them support stalled-cycles-backend.
Some more info:
$ perf list | grep stalled stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event] $ ls /sys/devices/cpu/events/ branch-instructions bus-cycles cache-references instructions mem-stores branch-misses cache-misses cpu-cycles mem-loads stalled-cycles-frontend $ cat /sys/bus/event_source/devices/cpu/events/stalled-cycles-frontend event=0x0e,umask=0x01,inv,cmask=0x01
Edit: just tried this on an AMD Phenom II X6 1045T CPU, under Ubuntu 12.04 with Linux 3.2 (32bit) - and here it does show values for both stalled-cycles-frontend and stalled-cycles-backend.
Looks like perf
has not been updated to understand all the performance monitoring events that Ivy Bridge supports. Fortunately there is a generic, albeit cumbersome, interface that allows you to access the full list of performance monitoring events. I didn't see stalled-cycles-backend
in the list when I gave it a quick look, but maybe I missed, or maybe they have broken it down by all the different events that could stall the backend.
We start with
perf list --help
...shows the following NOTE
1. Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide http://www.intel.com/Assets/PDF/manual/253669.pdf
...armed with that URL you end up in
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
...you want section 19.3
19.3 PERFORMANCE MONITORING EVENTS FOR 3RD GENERATION INTEL® CORE™ PROCESSORS 3rd generation Intel® Core™ processors and Intel Xeon processor E3-1200 v2 product family are based on Intel microarchitecture code name Ivy Bridge. They support architectural performance-monitoring events listed in Table 19-1. Non-architectural performance-monitoring events in the processor core are listed in Table 19-5. The events in Table 19-5 apply to processors with CPUID signature of DisplayFamily_DisplayModel encoding with the following values: 06_3AH.
...so for architectural
events you need Table 19-1
19.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS Architectural performance events are introduced in Intel Core Solo and Intel Core Duo processors. They are also supported on processors based on Intel Core microarchitecture. Table 19-1 lists pre-defined architectural performance events that can be configured using general-purpose performance counters and associated event-select registers.
**Table 19-1. Architectural Performance Events
... now comes the tricky part, you take the UMask Value
as the upper 2 hex digits and the Event Num
is the lower 2 hex digits of a 4 hex digit hardware register number to be given to perf stat
.
perf stat --help
-e, --event= Select the PMU event. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor.
... it says NNN
but you can give it NNNN
. Let's verify that this works, let's ask perf stat
for cache-misses both as a symbolic event name and as a hex number from table 19-1. We'll invoke the date
command for simplicity.
$ perf stat -e r412e -e cache-misses date Fri Mar 28 09:28:52 CDT 2014 Performance counter stats for 'date': 2292 r412e 2292 cache-misses 0.003322663 seconds time elapsed $
As you can see both reported the same number, so far so good. Now we go to Table 19-5 for the non-architectural hardware registers, of which there are too many too list here, but I'll list a few:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With