How to mark some memory ranges as non-cacheable from C++?

Tags:

caching

I was reading the wikipedia on the CPU cache here: http://en.wikipedia.org/wiki/CPU_cache#Replacement_Policies

Marking some memory ranges as non-cacheable can improve performance, by avoiding caching of memory regions that are rarely re-accessed. This avoids the overhead of loading something into the cache, without having any reuse.

Now, I've been reading and learning about how to write programs with better cache performance (general considerations, usually not specific to C++), but I did not know that high-level code can interact with CPU caching behavior explicitly. So my question, is there a way to do what I quoted from that article, in C++?

Also, I would appreciate resources on how to improve cache performance specifically in C++, even if they do not use any functions that deal directly with the CPU caches. For example, I'm wondering if using excessive levels of indirection (eg., a container of pointers to containers of pointers) can damage cache performance.

499

asked Mar 03 '12 06:03

newprogrammer

2 Answers

On Windows, you can use VirtualProtect(ptr, length, PAGE_NOCACHE, &oldFlags) to set the caching behavior for memory to avoid caching.

Regarding too many indirections: Yes, they can damage cache performance, if you access different pieces of memory very often (which is what happens usually). It's important to note, though, that if you consistently dereference the same set of e.g. 8 blocks of memory, and only the 9th block differs, then it generally won't make a difference, because the 8 blocks would be cached after the first access.

answered Nov 05 '22 09:11

user541686

Some platforms have support for non-temporal loads and stores that bypass caches. That avoids the cost of losing whatever was previously in the cache. They're generally not available to higher-level languages directly and you have to write your own assembly code. But since even the existence of cache is platform-specific, the existence of ways to control the use of cache is likewise platform-specific. SSE4 does include non-temporal loads.

As a programmer, generally dealing with x86 platforms other than Windows, this article on x86 and x86-64 GCC intrinsics is probably the most useful.

answered Nov 05 '22 10:11

David Schwartz

Related questions
                            
                                Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math
                            
                                System-wide global variable / semaphore / mutex in C++/Linux?
                            
                                Does casting `std::floor()` and `std::ceil()` to integer type always give the correct result?
                            
                                why does std::allocator::deallocate require a size?
                            
                                libc++ vs libstdc++ std::is_move_assignable: Which is the most correct?
                            
                                Is there an equivalent instruction to rdtsc in ARM?
                            
                                Why can't fold expressions appear in a constant expression?
                            
                                Is there any automated way to implement post-constructor and pre-destructor virtual method calls?
                            
                                Returning a C++ class to Java via JNI
                            
                                Import a DLL with C++ (Win32)
                            
                                Accessing typedef from the instance
                            
                                Calling member functions from a constructor
                            
                                Overloading output stream operator for vector<T>
                            
                                Is it possible to dynamically create an array of constant size in C++?
                            
                                gdb interpret memory address as an object
                            
                                Is it possible to emulate template<auto X>?
                            
                                Function overloading and function pointers
                            
                                How to create a mock class with operator[]?
                            
                                Implementing the same method signature from two 'interfaces'
                            
                                Using SWIG with methods that take std::string as a parameter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With