I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system PART1 : <ol> <li>L1 is inclusive or exclusive ?</li> <li>L2 is inclusive or exclusive ?</li> </ol> PART2 : If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache. After that we access the array element from start index to end index with stride of 64B as cache line size is 64B. To get better accurate result we repeat this process(accessing array elements at index ,start-end) for multiple times, say 1 million times and takes the average. My understanding why this approach gives correct result as follows- When we access the array of size more than L2 cache size, then whole array is loaded from main memory to L3, then from L3 to L2, then L2 to L1. The last 32KB of the whole array is in L1 as it is recently accessed. The whole array is also present in L2 and L3 cache also due to inclusive property and cache coherency . Now, when I start accessing the array again from starting index, which is not in L1 cache, but in L2 cache, so there will be a cache miss and it will be loaded from L2 cache. And this way there will be higher access time required for all elements of whole array and in total I will get the total access time of whole array. To get the single access I will take the average of total no of access . My question is - Am I correct ? Thanks in advance .

See section 2.2.5 in the Intel optimization guide - http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf (note that this applies for Sandy-Bridge, but doesn't appear as changed for Ivy-Bridge, which has only minor micro-architectural changes over the previous generation). So regarding your questions: <ol> <li>For the L1 there's no question of inclusiveness as it doesn't have upper level caches to be inclusive-of</li> <li>The L2 cache is not inclusive, meaning that there's no guarantee that a line residing in the L1 would have to be in the L2 as well. However on most cases it's likely to be there since it was probably filled into the L2 when originally requested by the core, and has a good chance to survive longer in the L2 since it's bigger (and therefore the evictions are better spread over more sets), and filtered by the L1 (meaning less evictions usually)</li> </ol> Also note that if your benchmark is accessing a data-set larger than the L2, it will probably fail to sit in the L2 (especially if you access it serially and exceed the L2 by more than the size of a single way), and you'd have to fetch it from the L3.

Inclusive or exclusive ? L1, L2 cache in Intel Core IvyBridge processor

Tags:

cpu-architecture

c

cpu-cache

processor

I am having Intel Core IvyBridge processor , Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz( L1-32KB,L2-256KB,L3-8MB). I know L3 is inclusive and shared among multiple core. I want to know the following with respect to my system

PART1 :

L1 is inclusive or exclusive ?
L2 is inclusive or exclusive ?

PART2 :

If L1 and L2 are both inclusive then to find the access time of L2 we first declare an array(1MB) of size more than L2 cache(256KB) , then start accessing the whole array to load into L2 cache. After that we access the array element from start index to end index with stride of 64B as cache line size is 64B. To get better accurate result we repeat this process(accessing array elements at index ,start-end) for multiple times, say 1 million times and takes the average.

My understanding why this approach gives correct result as follows- When we access the array of size more than L2 cache size, then whole array is loaded from main memory to L3, then from L3 to L2, then L2 to L1. The last 32KB of the whole array is in L1 as it is recently accessed. The whole array is also present in L2 and L3 cache also due to inclusive property and cache coherency . Now, when I start accessing the array again from starting index, which is not in L1 cache, but in L2 cache, so there will be a cache miss and it will be loaded from L2 cache. And this way there will be higher access time required for all elements of whole array and in total I will get the total access time of whole array. To get the single access I will take the average of total no of access .

My question is - Am I correct ?

Thanks in advance .

906

asked Nov 04 '13 19:11

bholanath

1 Answers

See section 2.2.5 in the Intel optimization guide -
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

(note that this applies for Sandy-Bridge, but doesn't appear as changed for Ivy-Bridge, which has only minor micro-architectural changes over the previous generation).

So regarding your questions:

For the L1 there's no question of inclusiveness as it doesn't have upper level caches to be inclusive-of
The L2 cache is not inclusive, meaning that there's no guarantee that a line residing in the L1 would have to be in the L2 as well. However on most cases it's likely to be there since it was probably filled into the L2 when originally requested by the core, and has a good chance to survive longer in the L2 since it's bigger (and therefore the evictions are better spread over more sets), and filtered by the L1 (meaning less evictions usually)

Also note that if your benchmark is accessing a data-set larger than the L2, it will probably fail to sit in the L2 (especially if you access it serially and exceed the L2 by more than the size of a single way), and you'd have to fetch it from the L3.

answered Sep 23 '22 19:09

Leeor

Related questions
                            
                                How to improve performance of following loop
                            
                                how to read from text file and store in matrix in c
                            
                                How to seed the PRNG for BN_generate_prime
                            
                                invalid conversion from void* to void(*)(void*)[-fpermissive]
                            
                                Compiler warning on seemingly compatible function pointer assignment (const vs no-const)
                            
                                Memory leak in OpenSSL function EVP_EncryptFinal_ex
                            
                                Determine if FILE * is writable
                            
                                Clear upper bytes of __m128i
                            
                                invalid type argument of ‘->’
                            
                                Reading a growing file
                            
                                How to check if the pipe is opend before writing?
                            
                                gcc 4.8.1: combining c code with c++11 code
                            
                                how to use makefile to include .a static library and .h file from another directory in C?
                            
                                Error:- The ITransactionLocal interface is not supported by the 'Microsoft.ACE.OLEDB.12.0' provider.Are there any pre requisites?
                            
                                GCC INLINE ASSEMBLY Won't Let Me Overwrite $esp
                            
                                Unexplainable change in C variable
                            
                                "undefined reference to 'cblas_ddot'" when using cblas library
                            
                                C char array overflow, okay practice?
                            
                                Why does math.h need to be linked in makefile but not string.h? [duplicate]
                            
                                In C, is it possible to integrate new code in a running process by recompiling a dynamic library?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With