How is an LRU cache implemented in a CPU?

Tags:

I'm studying up for an interview and want to refresh my memory on caching. If a CPU has a cache with an LRU replacement policy, how is that actually implemented on the chip? Would each cache line store a timestamp tick?

Also what happens in a dual core system where both CPUs write to the one address simultaneously?

842

asked May 03 '14 18:05

fred basset

1 Answers

For a traditional cache with only two ways, a single bit per set can be used to track LRU. On any access to a set that hits, the bit can be set to the way that did not hit.

For larger associativity, the number of states increases dramatically: factorial of the number of ways. So a 4-way cache would have 24 states, requiring 5 bits per set and an 8-way cache would have 40,320 states, requiring 16 bits per set. In addition to the storage overhead, there is also greater overhead in updating the value.

For a 4-way cache, the following encoding of the state that would seem to work reasonably well: two bits for the most recently used way number, two bits for the next most recently used way number, and a bit indicating if the higher or lower numbered way was more recently used.

On a MRU hit, the state is unchanged.
On a next-MRU hit the two bit fields are swapped.
On other hits, the numbers of the two other ways are decoded, the number of the way that hits is placed in the first two-bit portion and the former MRU way number is placed in the second two-bit portion. The final bit is set based on whether the next-MRU way number is higher or lower than the less recently used way that did not hit.
On a miss, the state is updated as if an LRU hit had occurred.

Because LRU tracking has such overhead, simpler mechanisms like binary tree pseudo-LRU are often used. On a hit, such just updates each branching part of the tree with which half of the associated ways the hit was in. For a power of two number of ways W, a binary tree pLRU cache would have W-1 bits of state per set. A hit in way 6 of an 8-way cache (using a 3-level binary tree) would clear the bit at the base of the tree to indicate that the lower half of the ways (0,1,2,3) are less recently used, clear the higher bit at the next level to indicate that the lower half of those ways (4,5) are less recently used and set the higher bit in the final level to indicate that the upper half of those ways (7) is less recently used. Not having to read this state in order to update it can simplify hardware.

For skewed associativity, where different ways use different hashing functions, something like an abbreviated time stamp has been proposed (e.g., "Analysis and Replacement for Skew-Associative Caches", Mark Brehob et al., 1997). Using a miss counter is more appropriate than a cycle count, but the basic idea is the same.

With respect to what happens when two cores try to write to the same cache line at the same time, this is handled by only allowing one L1 cache to have the cache line in the exclusive state at a given time. Effectively there is a race and one core will get exclusive access. If only one of the writing core already has the cache line in a shared state, it will probably be more likely to win the race. With the cache line in shared state, the cache only needs to send an invalidation request to other potential holders of the cache line; with the cache line not present a write would typically need to request the cache line of data as well as asking for exclusive state.

Writes by different cores to the same cache line (whether to the same specific address or, in the case of false sharing, to another address within the line of data) can result in "cache line ping pong", where different cores invalidate the cache line in other caches to get exclusive access (to perform a write) so that the cache line bounces around the system like a ping pong ball.

167

answered Nov 12 '22 15:11

Paul A. Clayton

Related questions
                            
                                NSCache's name property
                            
                                User space Vs Kernel space program performance difference
                            
                                webView not displaying website when offline
                            
                                High frequent concurrency for cache
                            
                                Cache specifications for intel core i7
                            
                                How to prevent browser cache for some files only in ASP.Net 5?
                            
                                Why is it recommended practice to store images on disk rather than in a Realm
                            
                                "New version available" with service worker and sw-precache
                            
                                when should we care about cache missing?
                            
                                ASP.NET: Where/how is web.config cached?
                            
                                Caching from URLs with a query string
                            
                                Ehcache Vs Static map cache implementation
                            
                                NSCachesDirectory not a directory in my file system
                            
                                Default cache used by Hibernate?
                            
                                How is clear page cache in the CodeIgniter
                            
                                When L1 misses are a lot different than L2 accesses... TLB related?
                            
                                Do Guava caches consider weight in eviction choices?
                            
                                Jon Skeet's Edulinq - Empty Array Caching
                            
                                How to prevent storing nil value in rails cache?
                            
                                Caching calls to an external API in a rails app

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is an LRU cache implemented in a CPU?

Tags:

cpu-architecture

caching

cpu-cache

cpu

lru

fred basset

People also ask

1 Answers

Paul A. Clayton

Recent Activity

Donate For Us