I've been reading Ulrich Drepper's, "What every programmer should know about memory" and in section 3.3.2 Measurements of Cache Effects ( halfway down the page ) it gives me the impression that accessing any member of a struct causes the whole struct to get pulled into the CPU cache.
Is this correct? If so, how does the hardware know about the layout of these structs? Or does the code generated by the compiler somehow force the entire struct to be loaded?
Or are the slowdowns from using larger structs primarily due to TLB misses caused by the structs being spread out across more memory pages?
The example struct used by Drepper is:
struct l {
struct l *n;
long int pad[NPAD];
};
Where sizeof(l)
is determined by NPAD
equals 0, 7, 15 or 31 resulting in structs that are 0, 56, 120, and 248 bytes apart and assuming cache lines that are 64 bytes and 4k pages.
Just iterating through the linked list gets significantly slower as the struct grows, even though nothing other than the pointer is actually being accessed.
The hardware does not know at all about the struct. But it is true that the hardware load in the cache some bytes around the bytes you are actually accessing. This is because the cache line has a size. It does not work on a byte by byte access but on e.g. 16 bytes size at a time.
You have to be careful when ordering the members of the struct so that often used members are close to each other. For instance if you have the following struct:
struct S {
int foo;
char name[64];
int bar;
};
If the member variables foo and bar are used very often, the hardware will load in cache the bytes around foo, and when you'll access bar, it will have to load the bytes around bar. Even if these bytes around foo and around bar are never used. Now rewrite your struct as follows:
struct S {
int foo;
int bar;
char name[64];
};
When you'll use foo, the hardware will load in cache the bytes around foo. When you'll use bar, bar will already be in the cache because bar is contained in the bytes around foo. The CPU won't have to wait for bar to be in the cache.
Answer is: accessing a single struct member does not pull the entire struct in the cache, but pull some other member of the struct into the cache.
The hardware doesn't know the layout of the struct, but just loads a number of bytes around the accessed member into cache. And yes, the slowdown from larger structs is because they will then be spread across more cache lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With