Every modern high-performance CPU of the x86/x86_64 architecture has some hierarchy of data caches: L1, L2, and sometimes L3 (and L4 in very rare cases), and data loaded from/to main RAM is cached in some of them.
Sometimes the programmer may want some data to not be cached in some or all cache levels (for example, when wanting to memset 16 GB of RAM and keep some data still in the cache): there are some non-temporal (NT) instructions for this like MOVNTDQA (https://stackoverflow.com/a/37092 http://lwn.net/Articles/255364/)
But is there a programmatic way (for some AMD or Intel CPU families like P3, P4, Core, Core i*, ...) to completely (but temporarily) turn off some or all levels of the cache, to change how every memory access instruction (globally or for some applications / regions of RAM) uses the memory hierarchy? For example: turn off L1, turn off L1 and L2? Or change every memory access type to "uncached" UC (CD+NW bits of CR0??? SDM vol3a pages 423 424, 425 and "Third-Level Cache Disable flag, bit 6 of the IA32_MISC_ENABLE MSR (Available only in processors based on Intel NetBurst microarchitecture) — Allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches.").
I think such action will help to protect data from cache side channel attacks/leaks like stealing AES keys, covert cache channels, Meltdown/Spectre. Although this disabling will have an enormous performance cost.
PS: I remember such a program posted many years ago on some technical news website, but can't find it now. It was just a Windows exe to write some magical values into an MSR and make every Windows program running after it very slow. The caches were turned off until reboot or until starting the program with the "undo" option.
To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps: Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0. Flush all caches using the WBINVD instruction.
You can set L3 cache as the default, so that when new node pools are created, L3 cache is enabled automatically. You can turn on L3 cache for a specific node pool. You can disable L3 cache for SSDs on a node pool and restore those SSDs to storage drives.
They are extra caches built between the CPU and the RAM. Sometimes L2 is built into the CPU with L1. L2 and L3 caches take slightly longer to access than L1. The more L2 and L3 memory available, the faster a computer can run.
The Intel's manual 3A, Section 11.5.3, provides an algorithm to globally disable the caches:
11.5.3 Preventing Caching
To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps:
- Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.
- Flush all caches using the WBINVD instruction.
- Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the uncached memory type (see the discussion of the discussion of the TYPE field and the E flag in Section 11.11.2.1, “IA32_MTRR_DEF_TYPE MSR”).
The caches must be flushed (step 2) after the CD flag is set to ensure system memory coherency. If the caches are not flushed, cache hits on reads will still occur and data will be read from valid cache lines.
The intent of the three separate steps listed above addresses three distinct requirements: (i) discontinue new data replacing existing data in the cache (ii) ensure data already in the cache are evicted to memory, (iii) ensure subsequent memory references observe UC memory type semantics. Different processor implementation of caching control hardware may allow some variation of software implementation of these three requirements. See note below.
NOTES Setting the CD flag in control register CR0 modifies the processor’s caching behaviour as indicated in Table 11-5, but setting the CD flag alone may not be sufficient across all processor families to force the effective memory type for all physical memory to be UC nor does it force strict memory ordering, due to hardware implementation variations across different processor families. To force the UC memory type and strict memory ordering on all of physical memory, it is sufficient to either program the MTRRs for all physical memory to be UC memory type or disable all MTRRs.
For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled.
That's a long quote but it boils down to this code
;Step 1 - Enter no-fill mode
mov eax, cr0
or eax, 1<<30 ; Set bit CD
and eax, ~(1<<29) ; Clear bit NW
mov cr0, eax
;Step 2 - Invalidate all the caches
wbinvd
;All memory accesses happen from/to memory now, but UC memory ordering may not be enforced still.
;For Atom processors, we are done, UC semantic is automatically enforced.
xor eax, eax
xor edx, edx
mov ecx, IA32_MTRR_DEF_TYPE ;MSR number is 2FFH
wrmsr
;P4 only, remove this code from the L1I
wbinvd
most of which is not executable from user mode.
AMD's manual 2 provides a similar algorithm in section 7.6.2
7.6.2 Cache Control Mechanisms
The AMD64 architecture provides a number of mechanisms for controlling the cacheability of memory. These are described in the following sections.Cache Disable. Bit 30 of the CR0 register is the cache-disable bit, CR0.CD. Caching is enabled when CR0.CD is cleared to 0, and caching is disabled when CR0.CD is set to 1. When caching is disabled, reads and writes access main memory.
Software can disable the cache while the cache still holds valid data (or instructions). If a read or write hits the L1 data cache or the L2 cache when CR0.CD=1, the processor does the following:
- Writes the cache line back if it is in the modified or owned state.
- Invalidates the cache line.
- Performs a non-cacheable main-memory access to read or write the data.
If an instruction fetch hits the L1 instruction cache when CR0.CD=1, some processor models may read the cached instructions rather than access main memory. When CR0.CD=1, the exact behavior of L2 and L3 caches is model-dependent, and may vary for different types of memory accesses.
The processor also responds to cache probes when CR0.CD=1. Probes that hit the cache cause the processor to perform Step 1. Step 2 (cache-line invalidation) is performed only if the probe is performed on behalf of a memory write or an exclusive read.
Writethrough Disable. Bit 29 of the CR0 register is the not writethrough disable bit, CR0.NW. In early x86 processors, CR0.NW is used to control cache writethrough behavior, and the combination of CR0.NW and CR0.CD determines the cache operating mode.
[...]
In implementations of the AMD64 architecture, CR0.NW is not used to qualify the cache operating mode established by CR0.CD.
This translates to this code (very similar to the Intel's one):
;Step 1 - Disable the caches
mov eax, cr0
or eax, 1<<30
mov cr0, eax
;For some models we need to invalidated the L1I
wbinvd
;Step 2 - Disable speculative accesses
xor eax, eax
xor edx, edx
mov ecx, MTRRdefType ;MSR number is 2FFH
wrmsr
Caches can also be selectively disabled at:
IA32_PAT
with caching types and using the bits PAT, PCD, PWT as a 3-bit index it's possible to select one the six caching types (UC-, UC, WC, WT, WP, WB). Of these options only the page attributes can be exposed to user mode programs (see for example this).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With