Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many bytes does a Xeon bring into the cache per memory access?

I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs) held in RAM, over 10 GB, and elements of it need to be accessed periodically. I want to revise the data structure to work with the system's caching mechanism as much as possible.

Currently, accesses are done mostly randomly across the structure, and each time 1-4 32-bit ints are read. It is a long time before another read occurs in the same place, so there is no benefit from the cache.

Now I know that when you read a byte from a random location in RAM, more than just that byte is brought into the cache. My question is how many bytes are brought in? Is it 16, 32, 64, 4096? Is this called a cache line?

I am looking to redesign the data structure to minimize random RAM accesses and work with the cache instead of against it. Knowing how many bytes are pulled into the cache on a random access will inform the design choices I make.

Update (October 2014): Shortly after I posed the question above the project was put on hold. It has since resumed and based on suggestions in answers below, I performed some experiments around RAM access, because it seemed likely that TLB thrash was happening. I revised the program to run with huge pages (2MB instead of the standard 4KB), and observed a small speedup, about 2.5%. I found great information about setting up for huge pages here and here.

like image 646
Randall Cook Avatar asked Dec 23 '11 20:12

Randall Cook


3 Answers

Today’s CPUs fetch memory in chunks of (typically) 64 bytes, called cache lines. When you read a particular memory location, the entire cache line is fetched from the main memory into the cache.

More here : http://igoro.com/archive/gallery-of-processor-cache-effects/

like image 161
Software_Designer Avatar answered Nov 02 '22 19:11

Software_Designer


A cache line for any current Xeon processor is 64 bytes. One other thing that you might want to think about is the TLB. If you are really doing random accesses across 10GB of memory then you are likely to have a lot of TLB misses which can potentially be as costly as cache misses. You can get work around with with large pages, but it's something to keep in mind.

like image 3
Gabriel Southern Avatar answered Nov 02 '22 20:11

Gabriel Southern


Old SO question that has some info that might be of use to you (in particular the first answer where to look for Linux CPU info - responder doesn't mention line size proper, but 'other info' on top of associativity etc). Question is for x86, but answers are more general. Worth a look.

Where is the L1 memory cache of Intel x86 processors documented?

like image 2
gnometorule Avatar answered Nov 02 '22 19:11

gnometorule