Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a processor fetch cache lines?

When a processor pre-fetches a cache-line of data, does it pre-fetch from that address up to the number of bytes or does it pre-fetch from that address up to half the cache line and back wards up to half the cache line?

For example assume cache line is 4 bytes and pre-fetching from address 0x06. Will it fetch bytes at 0x06 0x07 0x08 0x09 or will it pre-fetch from addresses 0x04 0x05 0x06 0x07.

I need this info for a program which I am writing and need to optimize.

like image 639
d2alphame Avatar asked Dec 11 '25 08:12

d2alphame


2 Answers

According to this (which is naturally Intel specific)

"The cache line size is 32 bytes, or 256 bits. A cache line is filled by a burst of four reads on the processor’s 64-bit data bus."

This means 8 bytes are fetched in parallel from main memory, within these 8 bytes there's no first or last, they arrive simultaneously, as the bytes are fetched over a 64 bit wide bus.

As it takes 4 reads to fill a cache line, Intel seems to not specify the order of these 4 reads - which mean you're left with some choices, e.g.

  • assume that there is no specific order
  • assume the address are fetched from lowest to highest, or vice versa.

The first assumption is of course the safest - since the order is as far as I can find undocumented(so it could depend on the model, or other factors)

like image 53
nos Avatar answered Dec 13 '25 01:12

nos


The cache lines have to have an alignment, so if your first read or first transaction that has a miss that causes a cache line fetch, is in the middle of a cache line it will go back and read the whole cache line (so the part before your address and the part after).

In general the cache uses a portion of the address to determine hit/miss. So if say the cache line was 256 bytes, then the address bits used to determine hit/mist would start at bit 8 and depending on how big the cache was (depth and ways) would determine how many bits to look at. So using my example if an access at address 0x123 produced a miss, then the cache line from 0x100-0x1FF would be read.

if it were the other way that would be a lot more logic and work and confusion, if you could start a cache line on any byte, it would be harder to determine hit/miss, and/or you would/could have overlapping cache lines (some item of data is in more than one place), that would have to be managed overall making the cache slower.

like image 29
old_timer Avatar answered Dec 13 '25 01:12

old_timer