I am interested in forcing a CPU cache flush in Windows (for benchmarking reasons, I want to emulate starting with no data in CPU cache), preferably a basic C implementation or Win32 call.
Is there a known way to do this with a system call or even something as sneaky as doing say a large memcpy
?
Intel i686 platform (P4 and up is okay as well).
A pipeline flush, is also known as a pipeline break or a pipeline stall. It's a procedure enacted by a CPU when it cannot ensure that it will correctly process its instruction pipeline in the next clock cycle.
Flush cache definition Cache flushing will clear that information in order to help smoothen and improve computer speed. In other words, everything including data and applications contained in that cache will be removed.
The CPU control unit automatically checks cache for instructions before requesting data from RAM. This saves fetching the instructions and data repeatedly from RAM – a relatively slow process which might otherwise keep the CPU waiting. Transfers to and from cache take less time than transfers to and from RAM.
Fortunately, there is more than one way to explicitly flush the caches.
The instruction "wbinvd" writes back modified cache content and marks the caches empty. It executes a bus cycle to make external caches flush their data. Unfortunately, it is a privileged instruction. But if it is possible to run the test program under something like DOS, this is the way to go. This has the advantage of keeping the cache footprint of the "OS" very small.
Additionally, there is the "invd" instruction, which invalidates caches without flushing them back to main memory. This violates the coherency of main memory and cache, so you have to take care of that by yourself. Not really recommended.
For benchmarking purposes, the simplest solution is probably copying a large memory block to a region marked with WC (write combining) instead of WB. The memory mapped region of the graphics card is a good candidate, or you can mark a region as WC by yourself via the MTRR registers.
You can find some resources about benchmarking short routines at Test programs for measuring clock cycles and performance monitoring.
There are x86 assembly instructions to force the CPU to flush certain cache lines (such as CLFLUSH), but they are pretty obscure. CLFLUSH in particular only flushes a chosen address from all levels of cache (L1, L2, L3).
something as sneaky as doing say a large memcopy?
Yes, this is the simplest approach, and will make sure that the CPU flushes all levels of cache. Just exclude the cache flushing time from your benchmakrs and you should get a good idea how your program performs under cache pressure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With