How is the size of a C++ class determined?

Tags:

memory-alignment

Summary: How does the compiler statically determine the size of a C++ class during compilation?

Details:

I'm trying to understand what the rules are for determining how much memory a class will use, and also how the memory will be aligned.

For example the following code declares 4 classes. The first 2 are each 16 bytes. But the 3 is 48 bytes, even though it contains the same data members as the first 2. While the fourth class has the same data members as the third, just in a different order, but it is 32 bytes.

#include <xmmintrin.h>
#include <stdio.h>

class TestClass1 {
  __m128i vect;
};

class TestClass2 {
  char buf[8];
  char buf2[8];
};

class TestClass3 {
  char buf[8];
  __m128i vect;
  char buf2[8];
};

class TestClass4 {
  char buf[8];
  char buf2[8];
  __m128i vect;
};


TestClass1 *ptr1;
TestClass2 *ptr2;
TestClass3 *ptr3;
TestClass4 *ptr4;
int main() {
  ptr1 = new TestClass1();
  ptr2 = new TestClass2();
  ptr3 = new TestClass3();
  ptr4 = new TestClass4();
  printf("sizeof TestClass1 is: %lu\t TestClass2 is: %lu\t TestClass3 is: %lu\t TestClass4 is: %lu\n", sizeof(*ptr1), sizeof(*ptr2), sizeof(*ptr3), sizeof(*ptr4));
  return 0;
}

I know that the answer has something to do with alignment of the data members of the class. But I am trying to understand exactly what these rules are and how they get applied during the compilation steps because I have a class that has a __m128i data member, but the data member is not 16-byte aligned and this results in a segfault when the compiler generates code using movaps to access the data.

891

asked Jan 24 '13 21:01

Gabriel Southern

2 Answers

It is entirely up to the compiler how the size of a class is determined. A compiler will usually compile to match a certain application binary interface, which is platform dependent.

The behaviour you've observed, however, is pretty typical. The compiler is trying to align the members so that they each begin at a multiple of their size. In the case of TestClass3, the one of the members is of type __m128i and sizeof(__m128i) == 16. So it will try to align that member to begin at a byte that is a multiple of 16. The first member is of type char[8] so takes up 8 bytes. If the compiler were to place the _m128i object directly after this first member, it would start at position 8, which is not a multiple of 16:

0               8               16              24              32              48
┌───────────────┬───────────────────────────────┬───────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
│    char[8]    │            __m128i            │    char[8]    │           
└───────────────┴───────────────────────────────┴───────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

So instead it prefers to do this:

0               8               16              24              32              48
┌───────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┬───────────────────────────────┬───────────────┐┄┄┄
│    char[8]    │               │           __m128i             │    char[8]    │
└───────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┴───────────────────────────────┴───────────────┘┄┄┄

This gives it a size of 48 bytes.

When you reorder the members to get TestClass4 the layout becomes:

0               8               16              24              32              48
┌───────────────┬───────────────┬───────────────────────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
│    char[8]    │    char[8]    │           __m128i             │        
└───────────────┴───────────────┴───────────────────────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄

Now everything is correctly aligned - the arrays are at offsets that are multiple of 1 (the size of their elements) and the __m128i object is at an offset that is a multiple of 16 - and the total size is 32 bytes.

The reason the compiler doesn't just do this rearrangement itself is because the standard specifies that later members of a class should have higher addresses:

Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object.

answered Sep 24 '22 01:09

Joseph Mansfield

For POD (plain old data), the rules are typically:

Each member in the structure has some size s and some alignment requirement a.
The compiler starts with a size S set to zero and an alignment requirement A set to one (byte).
The compiler processes each member in the structure in order:

Consider the member’s alignment requirement a. If S is not currently a multiple of a, then add just enough bytes to S so that it is a multiple of a. This determines where the member will go; it will go at offset S from the beginning of the structure (for the current value of S).
Set A to the least common multiple¹ of A and a.
Add s to S, to set aside space for the member.

When the above process is done for each member, consider the structure’s alignment requirement A. If S is not currently a multiple of A, then add just enough to S so that it is a multiple of A.

The size of the structure is the value of S when the above is done.

Additionally:

If any member is an array, its size is the number of elements multiplied by the size of each element, and its alignment requirement is the alignment requirement of an element.
If any member is a structure, its size and alignment requirement are calculated as above.
If any member is a union, its size is the size of its largest member plus just enough to make it a multiple of the least common multiple¹ of the alignments of all the members.

Consider your TestClass3:

S starts at 0 and A starts at 1.
char buf[8] requires 8 bytes and alignment 1, so S is increased by 8 to 8, and A remains 1.
__m128i vect requires 16 bytes and alignment 16. First, S must be increased to 16 to give the correct alignment. Then A must be increased to 16. Then S must be increased by 16 to make space for vect, so S is now 32.
char buf2[8] requires 8 bytes and alignment 1, so S is increased by 8 to 24, and A remains 16.
At the end, S is 24, which is not a multiple of A (16), so S must be increased by 8 to 32.

So the size of TestClass3 is 32 bytes.

For elementary types (int, double, et cetera), the alignment requirements are implementation-defined and are usually largely determined by the hardware. On many processors, it is faster to load and store data when it has a certain alignment (usually when its address in memory is a multiple of its size). Beyond this, the rules above follow largely from logic; they put each member where it must be to satisfy alignment requirements without using more space than necessary.

Footnote

¹ I have worded this for a general case as using the least common multiple of alignment requirements. However, since alignment requirements are always powers of two, the least common multiple of any set of alignment requirements is the largest of them.

answered Sep 24 '22 01:09

Eric Postpischil

Related questions
                            
                                Check for invalid UTF8
                            
                                To inline or not to inline
                            
                                C++0x thread static linking problem
                            
                                C++ - Overloading [] operators based on the side of assignment
                            
                                win32 api function to get processor's current speed
                            
                                Qt C++ minimize and maximize window
                            
                                Find common elements from two very large Arrays
                            
                                Using recursion and backtracking to generate all possible combinations
                            
                                Weighted median computation
                            
                                How to generate Zipf distributed numbers efficiently?
                            
                                Wrap overloaded function via std::function
                            
                                Sharing static variables across files: namespace vs class
                            
                                OpenCV SURF function is not implemented
                            
                                Is using typeid on a forward declared type undefined behavior?
                            
                                C++11 filesystem (VS2012)
                            
                                Debugging principles/core topics in C/C++ [closed]
                            
                                Combining & and * operators
                            
                                subset a vector and sort it
                            
                                How to get the raw command line arguments
                            
                                std::numeric_limits::is_exact ... what is a usable definition?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With