I was reading the wikipedia on the CPU cache here: http://en.wikipedia.org/wiki/CPU_cache#Replacement_Policies
Marking some memory ranges as non-cacheable can improve performance, by avoiding caching of memory regions that are rarely re-accessed. This avoids the overhead of loading something into the cache, without having any reuse.
Now, I've been reading and learning about how to write programs with better cache performance (general considerations, usually not specific to C++), but I did not know that high-level code can interact with CPU caching behavior explicitly. So my question, is there a way to do what I quoted from that article, in C++?
Also, I would appreciate resources on how to improve cache performance specifically in C++, even if they do not use any functions that deal directly with the CPU caches. For example, I'm wondering if using excessive levels of indirection (eg., a container of pointers to containers of pointers) can damage cache performance.
Normal Non-Cacheable memory is not looked-up in any cache. The requests are sent directly to memory. Read requests might over-read in memory, for example, reading 64 bytes of memory for a 4-byte access, and might satisfy multiple memory requests with a single external memory access.
Dynamic information that changes regularly or for each user request and serves no purpose if it were cached. Web pages that return the results of a search are non-cacheable, because their contents are unique almost all the time.
The main memory in your system that can move its information into your system's cache memory is called the “cacheable memory.” Memory in your system that is not cacheable performs as if your system is cacheless, moving information as needed directly to the processor without the ability to use the cache memory as a fast ...
On Windows, you can use VirtualProtect(ptr, length, PAGE_NOCACHE, &oldFlags)
to set the caching behavior for memory to avoid caching.
Regarding too many indirections: Yes, they can damage cache performance, if you access different pieces of memory very often (which is what happens usually). It's important to note, though, that if you consistently dereference the same set of e.g. 8 blocks of memory, and only the 9th block differs, then it generally won't make a difference, because the 8 blocks would be cached after the first access.
Some platforms have support for non-temporal loads and stores that bypass caches. That avoids the cost of losing whatever was previously in the cache. They're generally not available to higher-level languages directly and you have to write your own assembly code. But since even the existence of cache is platform-specific, the existence of ways to control the use of cache is likewise platform-specific. SSE4 does include non-temporal loads.
As a programmer, generally dealing with x86 platforms other than Windows, this article on x86 and x86-64 GCC intrinsics is probably the most useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With