Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clear L1, L2 and L3 caches?

I am doing some cache performance measuring and I need to ensure the caches are empty of "useful" data before timing.

Assuming an L3 cache is 10MB would it suffice to create a vector of 10M/4 = 2,500,000 floats, iterate through the whole of this vector, sum the numbers and that would empty the whole cache of any data which was in it prior to iterating through the vector?

like image 733
user997112 Avatar asked May 15 '14 22:05

user997112


People also ask

Is it possible to clear the L3 cache on a CPU?

You should be aware that L1 and sometimes L2 caches are per core and by clearing the L3 cache, you could still run into trouble if your program switches cores. Show activity on this post. Yes, that should be sufficient for flushing the L3 cache of useful data.

What is the L2 cache?

The L2 cache contains data that is likely to be accessed by the CPU for the following code in the execution. In most modern CPUs, the L1 and L2 caches are located on the inside of the CPU itself.

How long does it take to load data from L1 cache?

Imagine that a CPU has to load data from the L1 cache 100 times in a row. The L1 cache has a 1ns access latency and a 100 percent hit rate. It, therefore, takes our CPU 100 nanoseconds to perform this operation. Haswell-E die shot (click to zoom in).

What is L1 L2 L3 and L4 memory?

You’ll notice CPU cache is always backed by the term L1, L2, L3, and sometimes even L4. These terms denote the multi-level cache used for CPUs. So, L1 would be level 1, L2 is level 2, and L3, of course, level 3. L1 is the fastest memory found in any consumer PC.


2 Answers

Yes, that should be sufficient for flushing the L3 cache of useful data.

I have done similar types of measurements and cross-verified by using Intel's cache counters to verify that I incur the expected number of L3 cache misses during my tests.

If you want to absolutely sure, you should also use the counters. In particular, you can measure last-level cache misses by using Event select 2EH, Umask 41H in most Intel architectures.

See the Intel Manual for details on these counters.

like image 151
merlin2011 Avatar answered Sep 21 '22 00:09

merlin2011


It depends on how insane you are trying to be to get your guarantee.

x86_64 L3 cache is physically indexed, and while a 10MiB chunk that's linear in virtual space is almost definitely going to be physically contiguous on a lightly mem-loaded machine, it's not guaranteed.

Sandy and Ivy Bridge, for example, have L3 cache in 2MiB slices with 16-way set associativity (128kiB stride), so you could guarantee physical coverage by doing a MAP_HUGETLB mmap() call, assuming standard 2-4MiB huge pages.

Also, since each slice (on new Sandy/Ivy Bridge at least) is attached to a different core, and which slice a given physical address resides on is determined by a hash of some low/middle-order address bits, you might have to make an array slightly larger than the size of L3 to counter for minutely uneven overlap.

At this point, scrubbing your array a few times linearly should do the trick.

like image 22
Jeff Avatar answered Sep 17 '22 00:09

Jeff