Does cache memory refresh itself if doesn't encounter any instruction for a threshold amount of time?
What I mean is that suppose, I have a multi-core machine and I have isolated core on it. Now, for one of the cores, there was no activity for say a few seconds. In this case, will the last instructions from the instruction cache be flushed after a certain amount of time has passed?
I understand this can be architecture dependent but I am looking for general pointers on the concept.
If a cache is power-gated in a particular idle state and if it's implemented using a volatile memory technology (such as SRAM), the cache will lose its contents. In this case, to maintain the architectural state, all dirty lines must be written to some memory structure that will retain its state (such as the next level of the memory hierarchy). Most processors support power-gating idle states. For example, on Intel processors, in the core C6 and deeper states, the core is fully power-gated including all private caches. When the core wakes up from any of these states, the caches will be cold.
It can be useful in an idle state, for the purpose of saving power, to flush a cache but not power-gate it. The ACPI specification defines such a state, called C3, in Section 8.1.4 (of version 6.3):
While in the C3 state, the processor’s caches maintain state but the processor is not required to snoop bus master or multiprocessor CPU accesses to memory.
Later in the same section it elaborates that C3 doesn't require preserving the state of caches, but also doesn't require flushing it. Essentially, a core in ACPI C3 doesn't guarantee cache coherence. In an implementation of ACPI C3, either the system software would be required to manually flush the cache before having a core enter C3 or the hardware would employ some mechanism to ensure coherence (flushing is not the only way). This idle state can potentially save more power compared to a shallower states by not having to engage in cache coherence.
To the best of my knowledge, the only processors that implement a non-power-gating version of ACPI C3 are those from Intel, starting with the Pentium II. All existing Intel x86 processors can be categorized according to how they implement ACPI C3:
Note that clock-gating saves power by:
With clock-gating, dynamic power is reduced to essentially zero. But static power is still consumed to maintain state in the volatile memory structures.
Many processors include at least one level of on-chip cache that is shared between multiple cores. The processor branded Core Solo and Core Duo (whether based on the Enhanced Pentium M or Core microarchitectures) introduced an idle state that implements ACPI C3 at the package-level where the shared cache may be gradually power-gate and restore (Intel's package-level states correspond to system-level states in the ACPI specification). This hardware state is called PC7, Enhanced Deeper Sleep State, Deep C4, or other names depending on the processor. The shared cache is much larger compared to the private caches, and so it would take much more time to fully flush. This can reduce the effectiveness of PC7. Therefore, it's flushed gradually (the last core of the package that enters CC7 performs this operation). In addition, when the package exits PC7, the shared cache is enabled gradually as well, which may reduce the cost of entering PC7 next time. This is the basic idea, but the details depend on the processor. In PC7, significant portions of the package are power-gated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With