Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't perf report cache misses?

According to perf tutorials, perf stat is supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't:

[joel@panda goog]$ perf stat ./hash   Performance counter stats for './hash':      869.447863 task-clock                #    0.997 CPUs utilized                       92 context-switches          #    0.106 K/sec                                4 cpu-migrations            #    0.005 K/sec                            1,041 page-faults               #    0.001 M/sec                    2,628,646,296 cycles                    #    3.023 GHz                        819,269,992 stalled-cycles-frontend   #   31.17% frontend cycles idle       132,355,435 stalled-cycles-backend    #    5.04% backend  cycles idle     4,515,152,198 instructions              #    1.72  insns per cycle                                                  #    0.18  stalled cycles per insn  1,060,739,808 branches                  # 1220.015 M/sec                        2,653,157 branch-misses             #    0.25% of all branches             0.871766141 seconds time elapsed 

What am I missing? I already searched the man page and the web, but didn't find anything obvious.

Edit: my CPU is an Intel i5 2300K, if that matters.

like image 329
static_rtti Avatar asked Feb 03 '13 16:02

static_rtti


People also ask

How slow is a cache miss?

A cache miss occurs either because the data was never placed in the cache, or because the data was removed (“evicted”) from the cache by either the caching system itself or an external application that specifically made that eviction request.

How do you annotate perf?

You can also annotate using the perf top command. Run the perf top command and press 'a' on any particular symbol that you want to annotate. It also dynamically updates data at a fixed interval.

What is Cache references?

unity3d Optimization Cache references Cache references to avoid the expensive calls especially in the update function. This can be done by caching these references on start if available or when available and checking for null/bool flat to avoid getting the reference again.


2 Answers

On my system, an Intel Xeon X5570 @ 2.93 GHz I was able to get perf stat to report cache references and misses by requesting those events explicitly like this

perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5 Performance counter stats for 'sleep 5':           10573 cache-references                                                       1949 cache-misses              #   18.434 % of all cache refs            1077328 cycles                    #    0.000 GHz                             715248 instructions              #    0.66  insns per cycle                 151188 branches                                                                154 faults                                                                    0 migrations                                                       5.002776842 seconds time elapsed 

The default set of events did not include cache events, matching your results, I don't know why

perf stat -B sleep 5  Performance counter stats for 'sleep 5':        0.344308 task-clock                #    0.000 CPUs utilized                        1 context-switches          #    0.003 M/sec                                0 CPU-migrations            #    0.000 M/sec                              154 page-faults               #    0.447 M/sec                           977183 cycles                    #    2.838 GHz                             586878 stalled-cycles-frontend   #   60.06% frontend cycles idle            430497 stalled-cycles-backend    #   44.05% backend  cycles idle            720815 instructions              #    0.74  insns per cycle                                                  #    0.81  stalled cycles per insn         152217 branches                  #  442.095 M/sec                             7646 branch-misses             #    5.02% of all branches             5.002763199 seconds time elapsed 
like image 121
amdn Avatar answered Oct 07 '22 19:10

amdn


In the latest source code, the default event does not include cache-misses and cache-references again:

struct perf_event_attr default_attrs[] = {    { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK      },   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES    },   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS      },   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS     },    { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES      },   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND  },   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS        },   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES       },  }; 

So the man and most web are out of date as so far.

like image 45
acgtyrant Avatar answered Oct 07 '22 20:10

acgtyrant