Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will the cache line aligned memory allocation pay off?

I just know basic ideas on aligned memory allocation. But I didn't cared much about align issue because I am not an assembly programmer, also didn't have experience with MMX/SIMD. And I think this is the one of the the premature optimizations.

These days people saying more and more about cache hit, cache coherent, optimization for size, etc. Some source code even allocate memory explicitly aligned on CPU cache lines.

Frankly, I don't know how much is the cache line size of my i7 CPU. I know there will be no harm with large size align. But will it really pay off, without SIMD ?

Let's say there 100000 items of 100 bytes data in a program. And access to these data is the most intensive work of the program.

If we change the data structure and make all the 100 bytes size data aligned by 16 byte, is it possible to gain noticeable performance gain ? 10%? 5%?

like image 688
9dan Avatar asked Jan 05 '11 14:01

9dan


People also ask

What is cache line alignment?

Typically a cache line is 32 bytes long and it is aligned to a 32 byte offset. First a block of memory, a memory line, is loaded into a cache line. This cost is a cache miss, the latency of memory. Then, after loading, bytes within a cache line can be referenced without penalty as long as it remains in the cache.

Why is memory alignment needed?

The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrency paradigms.

What does cache line mean?

When the processor accesses a part of memory that is not already in the cache it loads a chunk of the memory around the accessed address into the cache, hoping that it will soon be used again. The chunks of memory handled by the cache are called cache lines. The size of these chunks is called the cache line size.

What is page aligned memory?

For bigger chunks, it's better to use mmap() which maps you new pages somewhere directly, so you get "page aligned memory". Using this, your allocation doesn't share pages with other allocations. As soon as you don't need the memory any more, you can give it back to the OS.


1 Answers

This is one of my favorite recent blogs about cache effects. http://igoro.com/archive/gallery-of-processor-cache-effects/

like image 76
jcopenha Avatar answered Nov 09 '22 22:11

jcopenha