Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

1 Answers

L1 is very tightly coupled to the CPU core, and is accessed on every memory access (very frequent). Thus, it needs to return the data really fast (usually within on clock cycle). Latency and throughput (bandwidth) are both performance-critical for L1 data cache. (e.g. four cycle latency, and supporting two reads and one write by the CPU core every clock cycle). It needs lots of read/write ports to support this high access bandwidth. Building a large cache with these properties is impossible. Thus, designers keep it small, e.g. 32KB in most processors today.

L2 is accessed only on L1 misses, so accesses are less frequent (usually 1/20th of the L1). Thus, L2 can have higher latency (e.g. from 10 to 20 cycles) and have fewer ports. This allows designers to make it bigger.

L1 and L2 play very different roles. If L1 is made bigger, it will increase L1 access latency which will drastically reduce performance because it will make all dependent loads slower and harder for out-of-order execution to hide. L1 size is barely debatable.

If we removed L2, L1 misses will have to go to the next level, say memory. This means that a lot of access will be going to memory which would imply we need more memory bandwidth, which is already a bottleneck. Thus, keeping the L2 around is favorable.

Experts often refer to L1 as a latency filter (as it makes the common case of L1 hits faster) and L2 as a bandwidth filter as it reduces memory bandwidth usage.

Note: I have assumed a 2-level cache hierarchy in my argument to make it simpler. In many of today's multicore chips, there's an L3 cache shared between all the cores, while each core has its own private L1 and maybe L2. In these chips, the shared last-level cache (L3) plays the role of memory bandwidth filter. L2 plays the role of on-chip bandwidth filter, i.e. it reduces access to the on-chip interconnect and the L3. This allows designers to use a lower-bandwidth interconnect like a ring, and a slow single-port L3, which allows them to make L3 bigger.

Perhaps worth mentioning that the number of ports is a very important design point because it affects how much chip area the cache consumes. Ports add wires to the cache which consumes a lot of chip area and power.

answered Oct 11 '22 14:10

Aater Suleman

Related questions
                            
                                LRU cache implementation in Javascript
                            
                                A Cache Efficient Matrix Transpose Program?
                            
                                Fastest way to loop through a 2d array?
                            
                                No expires header sent, content cached, how long until browser makes conditional GET request?
                            
                                Website needs force refresh after deploy
                            
                                How do I clear the server cache in asp.net?
                            
                                How can I prevent a Dockerfile instruction from being cached?
                            
                                What is the difference between HttpContext.Current.Cache.Insert and HttpContext.Current.Cache.Add
                            
                                Does SQLAlchemy support caching?
                            
                                Glide not updating image android of same url?
                            
                                How to force Apache to use manually pre-compressed gz file of CSS and JS files?
                            
                                Temporal vs Spatial Locality with arrays
                            
                                X-Cache Header Explanation
                            
                                Amazon Cloudfront Cache-Control: no-cache header has no effect after 24 hours
                            
                                Firebase hosting - force browser to reset cache on new deploys?
                            
                                OutputCache setting inside my asp.net mvc web application. Multiple syntax to prevent caching
                            
                                How do I view Android application specific cache?
                            
                                How to limit the maximum size of a Map by removing oldest entries when limit reached [closed]
                            
                                Should I use HttpRuntime.Cache?
                            
                                How do I include a dynamic block in the product page with full page caching turned on?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

Tags:

cpu-architecture

memory

caching

cpu-cache

processor

Karthik Balaguru

People also ask

1 Answers

Aater Suleman

Recent Activity

Donate For Us