Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inclusive or exclusive ? L1, L2 cache in Intel Core IvyBridge processor

I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system

PART1 :

  1. L1 is inclusive or exclusive ?
  2. L2 is inclusive or exclusive ?

PART2 :

If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache. After that we access the array element from start index to end index with stride of 64B as cache line size is 64B. To get better accurate result we repeat this process(accessing array elements at index ,start-end) for multiple times, say 1 million times and takes the average.

My understanding why this approach gives correct result as follows- When we access the array of size more than L2 cache size, then whole array is loaded from main memory to L3, then from L3 to L2, then L2 to L1. The last 32KB of the whole array is in L1 as it is recently accessed. The whole array is also present in L2 and L3 cache also due to inclusive property and cache coherency . Now, when I start accessing the array again from starting index, which is not in L1 cache, but in L2 cache, so there will be a cache miss and it will be loaded from L2 cache. And this way there will be higher access time required for all elements of whole array and in total I will get the total access time of whole array. To get the single access I will take the average of total no of access .

My question is - Am I correct ?

Thanks in advance .

like image 906
bholanath Avatar asked Nov 04 '13 19:11

bholanath


People also ask

Are Intel caches inclusive?

An advantage of inclusive caches is that what's been brought into the cache hierarchy by one core is available to the other core. AMD processors tend to have exclusive caches; Intel processors tend to have inclusive caches.

Is L2 cache inclusive?

This is an inclusive cache model, where the same data can be present in both the L1 and L2 caches. In an exclusive cache, data can be present in only one cache and an address cannot be found in both the L1 and L2 caches at the same time.

What type of cache does Intel use?

Modern CPUs also often have a very small “L0” cache, which is often just a few KB in size and is used for storing micro-ops. AMD and Intel both use this kind of cache; Zen had a 2,048 µOP cache, while Zen 2 has a 4,096 µOP cache.

What is L1 and L2 cache in CPU?

L1 is "level-1" cache memory, usually built onto the microprocessor chip itself. For example, the Intel MMX microprocessor comes with 32 thousand bytes of L1. L2 (that is, level-2) cache memory is on a separate chip (possibly on an expansion card) that can be accessed more quickly than the larger "main" memory.


1 Answers

See section 2.2.5 in the Intel optimization guide -
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

(note that this applies for Sandy-Bridge, but doesn't appear as changed for Ivy-Bridge, which has only minor micro-architectural changes over the previous generation).

So regarding your questions:

  1. For the L1 there's no question of inclusiveness as it doesn't have upper level caches to be inclusive-of
  2. The L2 cache is not inclusive, meaning that there's no guarantee that a line residing in the L1 would have to be in the L2 as well. However on most cases it's likely to be there since it was probably filled into the L2 when originally requested by the core, and has a good chance to survive longer in the L2 since it's bigger (and therefore the evictions are better spread over more sets), and filtered by the L1 (meaning less evictions usually)

Also note that if your benchmark is accessing a data-set larger than the L2, it will probably fail to sit in the L2 (especially if you access it serially and exceed the L2 by more than the size of a single way), and you'd have to fetch it from the L3.

like image 57
Leeor Avatar answered Sep 23 '22 19:09

Leeor