How to clear L1, L2 and L3 caches?

Tags:

I am doing some cache performance measuring and I need to ensure the caches are empty of "useful" data before timing.

Assuming an L3 cache is 10MB would it suffice to create a vector of 10M/4 = 2,500,000 floats, iterate through the whole of this vector, sum the numbers and that would empty the whole cache of any data which was in it prior to iterating through the vector?

733

asked May 15 '14 22:05

user997112

2 Answers

Yes, that should be sufficient for flushing the L3 cache of useful data.

I have done similar types of measurements and cross-verified by using Intel's cache counters to verify that I incur the expected number of L3 cache misses during my tests.

If you want to absolutely sure, you should also use the counters. In particular, you can measure last-level cache misses by using Event select 2EH, Umask 41H in most Intel architectures.

See the Intel Manual for details on these counters.

151

answered Sep 21 '22 00:09

merlin2011

It depends on how insane you are trying to be to get your guarantee.

x86_64 L3 cache is physically indexed, and while a 10MiB chunk that's linear in virtual space is almost definitely going to be physically contiguous on a lightly mem-loaded machine, it's not guaranteed.

Sandy and Ivy Bridge, for example, have L3 cache in 2MiB slices with 16-way set associativity (128kiB stride), so you could guarantee physical coverage by doing a MAP_HUGETLB mmap() call, assuming standard 2-4MiB huge pages.

Also, since each slice (on new Sandy/Ivy Bridge at least) is attached to a different core, and which slice a given physical address resides on is determined by a hash of some low/middle-order address bits, you might have to make an array slightly larger than the size of L3 to counter for minutely uneven overlap.

At this point, scrubbing your array a few times linearly should do the trick.

answered Sep 17 '22 00:09

Jeff

Related questions
                            
                                Qt signals missing in external dll
                            
                                How to use arm-linux-androideabi compiler
                            
                                Fast search via index with keeping insertion order in C++
                            
                                C++ best way to have a persistent object?
                            
                                What's the rationale of the exceptions of temporary object lifetime expansion when bound to a reference?
                            
                                Qt: format an integer in a QString
                            
                                Why is my straightforward quaternion multiplication faster than SSE?
                            
                                warning MSB3305: Processing COM reference - At least one of the arguments cannot be marshaled by the runtime marshaler
                            
                                RAII, unique_ptr, and out parameters
                            
                                Who developes language bindings for Qt?
                            
                                Intel C++ compiler (icpc 14.0): "a derived class is not allowed here"
                            
                                Derive class A from a template class Base<A> so that Base<A> can use A::B?
                            
                                What is the difference between qDebug() used as a stream and as a function
                            
                                Read func interp of a z3 array from the z3 model
                            
                                Is one side of an assignment sequenced before the other in c++?
                            
                                boost thread and try_join_for gives different output each time
                            
                                Get frame from video with libvlc smem and convert it to opencv Mat. (c++)
                            
                                Overloading comparison operators for different types in c++
                            
                                Initializing a `static constexpr double` with MSVC 2013
                            
                                How would use of unnamed namespaces in headers cause ODR-violations?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to clear L1, L2 and L3 caches?

Tags:

c++

performance

cpu-architecture

caching

cpu

user997112

People also ask

2 Answers

merlin2011

Jeff

Recent Activity

Donate For Us