I am trying to figure out why a modified C program is running faster than its non modified counter part (I am adding very few lines of code to perform some additional work). In this context, I suspect "cache effects" to be the main explanation (instruction cache). Thus I reach the perf
(https://perf.wiki.kernel.org/index.php/Main_Page) profiling tool but unfortunately I am not able to understand the meaning of its outputs regarding cache misses.
Several events about cache are provided:
cache-references [Hardware event]
cache-misses [Hardware event]
L1-dcache-loads [Hardware cache event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-dcache-store-misses [Hardware cache event]
L1-dcache-prefetches [Hardware cache event]
L1-dcache-prefetch-misses [Hardware cache event]
L1-icache-loads [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
L1-icache-prefetches [Hardware cache event]
L1-icache-prefetch-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-stores [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-prefetches [Hardware cache event]
LLC-prefetch-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-prefetches [Hardware cache event]
dTLB-prefetch-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
iTLB-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
branch-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-stores [Hardware cache event]
node-store-misses [Hardware cache event]
node-prefetches [Hardware cache event]
node-prefetch-misses [Hardware cache event]
Where can I find explanation about these fields ? cache-misses event is always smaller than other events. What does this event measure ?
How to interpret the 26,760 L1-icache-load-misses for ls vs the 5,708 cache-misses in the following example ?
perf stat -e L1-icache-load-misses ls
caches caches~ out
Performance counter stats for 'ls':
26,760 L1-icache-load-misses
0.002816690 seconds time elapsed
perf stat -e cache-misses ls
caches caches~ out
Performance counter stats for 'ls':
5,708 cache-misses
0.002822122 seconds time elapsed
Some answers:
L1
is the Level-1 cache, the smallest and fastest one. LLC
on the other hand refers to the last level of the cache hierarchy, thus denoting the largest but slowest cache.i
vs. d
distinguishes instruction cache from data cache. Only L1 is split in this way, other caches are shared between data and instructions.TLB
refers to the translation lookaside buffer, a cache used when mapping virtual addresses to physical ones.You seem to think that the cache-misses
event is the sum of all other kind of cache misses (L1-dcache-load-misses
, and so on). That is actually not true.
the cache-misses
event represents the number of memory access that could not be served by any of the cache.
I admit that perf's documentation is not the best around.
However, one can learn quite a lot about it by reading (assuming that you already have a good knowledge of how a CPU and a performance monitoring unit work, this is clearly not a computer architecture course) the doc of the perf_event_open() function:
http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
For example, by reading it you can see that the cache-misses
event showed by perf list corresponds to PERF_COUNT_HW_CACHE_MISSES
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With