Does Cache empty itself if idle for a long time?

Question

Does cache memory refresh itself if doesn't encounter any instruction for a threshold amount of time?

What I mean is that suppose, I have a multi-core machine and I have isolated core on it. Now, for one of the cores, there was no activity for say a few seconds. In this case, will the last instructions from the instruction cache be flushed after a certain amount of time has passed?

I understand this can be architecture dependent but I am looking for general pointers on the concept.

Hadi Brais · Accepted Answer

If a cache is power-gated in a particular idle state and if it's implemented using a volatile memory technology (such as SRAM), the cache will lose its contents. In this case, to maintain the architectural state, all dirty lines must be written to some memory structure that will retain its state (such as the next level of the memory hierarchy). Most processors support power-gating idle states. For example, on Intel processors, in the core C6 and deeper states, the core is fully power-gated including all private caches. When the core wakes up from any of these states, the caches will be cold.

It can be useful in an idle state, for the purpose of saving power, to flush a cache but not power-gate it. The ACPI specification defines such a state, called C3, in Section 8.1.4 (of version 6.3):

While in the C3 state, the processor’s caches maintain state but the processor is not required to snoop bus master or multiprocessor CPU accesses to memory.

Later in the same section it elaborates that C3 doesn't require preserving the state of caches, but also doesn't require flushing it. Essentially, a core in ACPI C3 doesn't guarantee cache coherence. In an implementation of ACPI C3, either the system software would be required to manually flush the cache before having a core enter C3 or the hardware would employ some mechanism to ensure coherence (flushing is not the only way). This idle state can potentially save more power compared to a shallower states by not having to engage in cache coherence.

To the best of my knowledge, the only processors that implement a non-power-gating version of ACPI C3 are those from Intel, starting with the Pentium II. All existing Intel x86 processors can be categorized according to how they implement ACPI C3:

Intel Core and later and Bonnell and later: The hardware state is called C3. The implementation uses multiple power-reduction mechanisms. The one relevant to the question flushes all the core caches (instruction, data, uop, paging unit), probably by executing a microcode routine on entry to the idle state. That is, all dirty lines are written back to the closest shared level of the memory hierarchy (L2 or L3) and all valid clean lines are invalidated. This is how cache coherency is maintained. The rest of the core state is retained.
Pentium II, Pentium III, Pentium 4, and Pentium M: The hardware state is called Sleep in these processors. In the Sleep state, the processor is fully clock-gated and doesn't respond to snoops (among other things). On-chip caches are not flushed and the hardware doesn't provide an alternative mechanism that protects the valid lines from becoming incoherent. Therefore, the system software is responsible for ensuring cache coherence. Otherwise, Intel specifies that if a snoop request occurs to a processor that is transitioning into or out of Sleep or already in Sleep, the resulting behavior is unpredictable.
All others don't support ACPI C3.

Note that clock-gating saves power by:

Turning off the clock generation logic, which itself consumes power.
Turning off any logic that does something on each clock cycle.

With clock-gating, dynamic power is reduced to essentially zero. But static power is still consumed to maintain state in the volatile memory structures.

Many processors include at least one level of on-chip cache that is shared between multiple cores. The processor branded Core Solo and Core Duo (whether based on the Enhanced Pentium M or Core microarchitectures) introduced an idle state that implements ACPI C3 at the package-level where the shared cache may be gradually power-gate and restore (Intel's package-level states correspond to system-level states in the ACPI specification). This hardware state is called PC7, Enhanced Deeper Sleep State, Deep C4, or other names depending on the processor. The shared cache is much larger compared to the private caches, and so it would take much more time to fully flush. This can reduce the effectiveness of PC7. Therefore, it's flushed gradually (the last core of the package that enters CC7 performs this operation). In addition, when the package exits PC7, the shared cache is enabled gradually as well, which may reduce the cost of entering PC7 next time. This is the basic idea, but the details depend on the processor. In PC7, significant portions of the package are power-gated.

Does Cache empty itself if idle for a long time?

Tags:

cpu-architecture

cpu-cache

gundechaHills

1 Answers

Hadi Brais

Recent Activity

Donate For Us

Does Cache empty itself if idle for a long time?

Tags:

cpu-architecture

cpu-cache

gundechaHills

1 Answers

Hadi Brais

Related questions

Recent Activity

Donate For Us