The x86 INVD
invalidates the cache hierarchy without writing the contents back to memory, apparently.
I'm curious, what use is such an instruction? Given how one has very little control over what data may be in the various cache levels and even less control over what may have already been flushed asynchronously, it seems to be little more than a way to make sure you just don't know what data is held in memory anymore.
Excellent question!
One use-case for such a blunt-acting instruction as invd
is in specialized or very-early-bootstrap code, such as when the presence or absence of RAM has not yet been verified. Since we might not know whether RAM is present, its size, or even if particular parts of it function properly, or we might not want to rely on it, it's sometimes useful for the CPU to program part of its own cache to operate as RAM and use it as such. This is called Cache-as-RAM (CAR). During setup of CAR, while using CAR, and during teardown of CAR mode, the kernel must ensure nothing is ever written out from that cache to memory.
To set up CAR, the CPU must be set to No-Fill Cache Mode and must designate the memory range to be used for CAR as Write-Back. This can be done by the following steps:
invd
the entire cache, preventing any cached write from being written out and causing chaos.cr0.CD=0
).cr0.CD=1
).The motivation for setting up CAR is that once set up, all accesses (read/write) within the CAR region will hit cache and will not hit RAM, yet the cache's contents will be addressable and act just like RAM. Therefore, instead of writing assembler code that only ever uses registers, one can now use normal C code, provided that the stack and local/global variables it accesses are restricted to within the CAR region.
When CAR is exited, it would be a bad thing for all of the memory writes incurred in this "pseudo-RAM" to suddenly shoot out from cache and trash any actual content at the same address in RAM. So when CAR is exited, once again invd
is used to completely delete the contents of the CAR region, and then Normal Cache Mode is set up.
Intel alluded to the Cache-as-RAM use in the i486 Microprocessor Programmer's Reference Manual. The Intel 80486 was the CPU that first introduced the invd
instruction. Section 12.2 read:
12.2 OPERATION OF THE INTERNAL CACHE
Software controls the operating mode of the cache. Caching can be enabled (its state following reset initialization), caching can be disabled while valid cache lines exist (a mode in which the cache acts like a fast, internal RAM), or caching can be fully disabled.
Precautions must be followed when disabling the cache. Whenever CD is set to 1, the i486 processor will not read external memory if a copy is still in the cache. Whenever NW is set to 1, the i486 processor will not write to external memory if the data is in the cache. This means stale data can develop in the i486 CPU cache. This stale data will not be written to external memory if NW is later set to 0 or that cache line is later overwritten as a result of a cache miss. In general, the cache should be flushed when disabled.
It is possible to freeze data in the cache by loading it using test registers while CD and NW are set. This is useful to provide guaranteed cache hits for time critical interrupt code and data.
Note that all segments should start on 16 byte boundaries to allow programs to align code/data in cache lines.
coreboot has a slide-deck presenting their implementation of CAR, which describes the above procedure. The invd
instruction is used on Slide 21.
AMD calls it Cache-as-general-storage in §2.3.3: Using L2 Cache as General Storage During Boot.
In certain situations involving cache-incoherency due to DMA (Direct Memory Access) hardware, invd
might also prove useful.
To elaborate on IwillnotexistIdonotexists answer about CAR:
I think how it's actually done is
rep stos
is supposed to use a no-RFO protocol like ItoM, I.e. it only sends an invalidate and not an RFO), so I would think it needs to be mapped to an actual device, like SPI flash, because when reading from a device that has a mapping in the SAD but there is no receiving device when it gets to the IIO or the IMC, you'd get an MCA I think. If you actually get 0s when doing this, or use rep stos
, then it would be a possible alternative.INVD
the cache unless you know there is something in the cache that will be taking up unnecessary space, which is typically not the case at the stage of boot before the memory controller and RAM memory map has been configured.ia32_misc_enable
s and will function normally but it's a possibility that the core can inform the L3 slice CBo that it is operating in no fill cache mode, and won't fill the L3 on a miss. Alternatively, it might bypass the L3 entirely by issuing an UC request to the Cbo. I don't know whether cache coherency is maintained for writes hits, and whether coherent requests from other cores are handled, but a couple of sources claim this is the case, but this is irrelevant when you know only the BSP is active. If it does hit in a lower cache then it doesn't bring it up higher.INVD
during this state would effectively completely disable the cache because nothing would hit in the cache, so INVD
isn't used during CAR, only at the end.INVD
. If the CAR region is backed by SPI flash, it will not match the contents of the SPI flash 100%. It still be writing I.e. a stack over some random code, which you don't want to write back to SPI flash accidentally, so you must INVD
and not WBINVD
, and if the SPI Flash rejects the write you'd get an MCA probably. The fact the instruction was introduced on 486 when the first mention of CAR was, suggests the instruction was introduced for this very purpose and I don't think there's another use case.On Intel, CAR is set up by microcode to run the Startup ACM, therefore it doesn't need a specific macroinstruction for this. INVD
and CAR is however used by the ACM itself and the SEC core before the memory controller has been initialised, which of course requires the INVD
macroinstruction. I'll have to check whether SEC core enables CAR or whether it's already enabled, but I do know that the IBB blocks containing the SEC + PEI are in L3.
An important thing to mention is that when you load code into the cache, you need to make sure it's pushed out of the L1d and into L2, otherwise it won't be accessible by the instruction cache. This can be achieved by loading the code then loading something larger that the size of L1d (which is shared, not statically partitioned between threads, so it needs to be larger than the full size of L1d). I think this is because L1d is not coherent with L1i, although there is something called SMC so I'm not sure to what degree that is true.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With