Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does alignment really matter for performance in C++11?

Does alignment really matter for performance in C++11?

There is an advice in Stroustrup's book to order the members in a struct beginning from the biggest to the smallest. But I wonder if someone has made measurements to actually see if this makes any difference, and if it is worth it to think about when writing code.

like image 524
user3111311 Avatar asked Dec 28 '13 00:12

user3111311


People also ask

Is aligned memory faster?

Alignment helps the CPU fetch data from memory in an efficient manner: less cache miss/flush, less bus transactions etc. Some memory types (e.g. RDRAM, DRAM etc.) need to be accessed in a structured manner (aligned "words" and in "burst transactions" i.e. many words at one time) in order to yield efficient results.

Why is aligned memory faster?

So, as everyone thinks memory is cheap, they just made the compiler align the data on the processor's chunk sizes so your code runs faster and more efficiently at the cost of wasted memory.

What is alignment requirement in C?

Every complete object type has a property called alignment requirement, which is an integer value of type size_t representing the number of bytes between successive addresses at which objects of this type can be allocated.

Why does the compiler sometimes insert padding between fields?

The compiler will insert a padding byte after the char to ensure short int will have an address multiple of 2 (i.e. 2 byte aligned).


2 Answers

Alignment matters not only for performance, but also for correctness. Some architectures will fail with an processor trap if the data is not aligned correctly, or access the wrong memory location. On others, access to unaligned variables is broken into multiple accesses and bitshifts (often inside the hardware, sometimes by OS trap handler), losing atomicity.

The advice to sort members in descending order of size is for optimal packing / minimum space wasted by padding, not for alignment or speed. Members will be correctly aligned no matter what order you list them in, unless you request non-conformant layout using specialized pragmas (i.e. the non-portable #pragma pack) or keywords. Although total structure size is affected by padding and also affects speed, often there is another ordering that is optimal.

For best performance, you should try to get members which are used together into the same cache line, and members that are accessed by different threads into different cache lines. Sometimes that means a lot of padding to get a cross-thread shared variable alone in its own cache line. But that's better than taking a performance hit from false sharing.

like image 103
Ben Voigt Avatar answered Oct 05 '22 20:10

Ben Voigt


Just to add to Ben's great answer:

Defining struct members in the same order they are later accessed in your application will reduce cache misses and possibly increase performance. This will work provided the entire structure does not fit into L1 cache.

On the other hand, ordering the members from biggest to smallest may reduce overall memory usage, which may be important when storing an array of small structures.

Let's assume that for an architecture (I don't know them that well, I think that would be the case for default settings 32bit gcc, someone will correct me in comments) this structure:

struct MemoryUnused {   uint8_t val0;   uint16_t val1;   uint8_t val2;   uint16_t val3;   uint8_t val4;   uint32_t val5;   uint8_t val6; } 

takes 20 bytes in memory, while this:

struct MemoryNotLost {   uint32_t val5;   uint16_t val1;   uint16_t val3;   uint8_t val0;   uint8_t val2;   uint8_t val4;   uint8_t val6; } 

Will take 12. That's 8 bytes lost due to padding, and it's a 67% increase in size of the smallers struct. With a large array of such structs, the gain would be significant and, simply because of the amount of used memory, will decrease the amount of cache misses.

like image 32
Dariusz Avatar answered Oct 05 '22 18:10

Dariusz