According to perf tutorials, perf stat
is supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't:
[joel@panda goog]$ perf stat ./hash Performance counter stats for './hash': 869.447863 task-clock # 0.997 CPUs utilized 92 context-switches # 0.106 K/sec 4 cpu-migrations # 0.005 K/sec 1,041 page-faults # 0.001 M/sec 2,628,646,296 cycles # 3.023 GHz 819,269,992 stalled-cycles-frontend # 31.17% frontend cycles idle 132,355,435 stalled-cycles-backend # 5.04% backend cycles idle 4,515,152,198 instructions # 1.72 insns per cycle # 0.18 stalled cycles per insn 1,060,739,808 branches # 1220.015 M/sec 2,653,157 branch-misses # 0.25% of all branches 0.871766141 seconds time elapsed
What am I missing? I already searched the man page and the web, but didn't find anything obvious.
Edit: my CPU is an Intel i5 2300K, if that matters.
A cache miss occurs either because the data was never placed in the cache, or because the data was removed (“evicted”) from the cache by either the caching system itself or an external application that specifically made that eviction request.
You can also annotate using the perf top command. Run the perf top command and press 'a' on any particular symbol that you want to annotate. It also dynamically updates data at a fixed interval.
unity3d Optimization Cache references Cache references to avoid the expensive calls especially in the update function. This can be done by caching these references on start if available or when available and checking for null/bool flat to avoid getting the reference again.
On my system, an Intel Xeon X5570 @ 2.93 GHz
I was able to get perf stat
to report cache references and misses by requesting those events explicitly like this
perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5 Performance counter stats for 'sleep 5': 10573 cache-references 1949 cache-misses # 18.434 % of all cache refs 1077328 cycles # 0.000 GHz 715248 instructions # 0.66 insns per cycle 151188 branches 154 faults 0 migrations 5.002776842 seconds time elapsed
The default set of events did not include cache events, matching your results, I don't know why
perf stat -B sleep 5 Performance counter stats for 'sleep 5': 0.344308 task-clock # 0.000 CPUs utilized 1 context-switches # 0.003 M/sec 0 CPU-migrations # 0.000 M/sec 154 page-faults # 0.447 M/sec 977183 cycles # 2.838 GHz 586878 stalled-cycles-frontend # 60.06% frontend cycles idle 430497 stalled-cycles-backend # 44.05% backend cycles idle 720815 instructions # 0.74 insns per cycle # 0.81 stalled cycles per insn 152217 branches # 442.095 M/sec 7646 branch-misses # 5.02% of all branches 5.002763199 seconds time elapsed
In the latest source code, the default event does not include cache-misses
and cache-references
again:
struct perf_event_attr default_attrs[] = { { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS }, { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS }, { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES }, };
So the man and most web are out of date as so far.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With