Summary: How does the compiler statically determine the size of a C++ class during compilation?
Details:
I'm trying to understand what the rules are for determining how much memory a class will use, and also how the memory will be aligned.
For example the following code declares 4 classes. The first 2 are each 16 bytes. But the 3 is 48 bytes, even though it contains the same data members as the first 2. While the fourth class has the same data members as the third, just in a different order, but it is 32 bytes.
#include <xmmintrin.h>
#include <stdio.h>
class TestClass1 {
__m128i vect;
};
class TestClass2 {
char buf[8];
char buf2[8];
};
class TestClass3 {
char buf[8];
__m128i vect;
char buf2[8];
};
class TestClass4 {
char buf[8];
char buf2[8];
__m128i vect;
};
TestClass1 *ptr1;
TestClass2 *ptr2;
TestClass3 *ptr3;
TestClass4 *ptr4;
int main() {
ptr1 = new TestClass1();
ptr2 = new TestClass2();
ptr3 = new TestClass3();
ptr4 = new TestClass4();
printf("sizeof TestClass1 is: %lu\t TestClass2 is: %lu\t TestClass3 is: %lu\t TestClass4 is: %lu\n", sizeof(*ptr1), sizeof(*ptr2), sizeof(*ptr3), sizeof(*ptr4));
return 0;
}
I know that the answer has something to do with alignment of the data members of the class. But I am trying to understand exactly what these rules are and how they get applied during the compilation steps because I have a class that has a __m128i
data member, but the data member is not 16-byte aligned and this results in a segfault when the compiler generates code using movaps
to access the data.
Measure an objectSelect the object and choose Graphics > Object Properties. The width and height of the object appear in the Size area of the Object Properties dialog box.
In C++, the Size of an empty structure/class is one byte as to call a function at least empty structure/class should have some size (minimum 1 byte is required ) i.e. one byte to make them distinguishable.
The size of the object depends only on the member variables. In case of classes that contain virtual functions, the VPTR gets added to the object layout. So the size of the objects is basically size of the member variables + the size of the VPTRs.
It is entirely up to the compiler how the size of a class is determined. A compiler will usually compile to match a certain application binary interface, which is platform dependent.
The behaviour you've observed, however, is pretty typical. The compiler is trying to align the members so that they each begin at a multiple of their size. In the case of TestClass3
, the one of the members is of type __m128i
and sizeof(__m128i) == 16
. So it will try to align that member to begin at a byte that is a multiple of 16. The first member is of type char[8]
so takes up 8 bytes. If the compiler were to place the _m128i
object directly after this first member, it would start at position 8, which is not a multiple of 16:
0 8 16 24 32 48
┌───────────────┬───────────────────────────────┬───────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
│ char[8] │ __m128i │ char[8] │
└───────────────┴───────────────────────────────┴───────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
So instead it prefers to do this:
0 8 16 24 32 48
┌───────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┬───────────────────────────────┬───────────────┐┄┄┄
│ char[8] │ │ __m128i │ char[8] │
└───────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┴───────────────────────────────┴───────────────┘┄┄┄
This gives it a size of 48 bytes.
When you reorder the members to get TestClass4
the layout becomes:
0 8 16 24 32 48
┌───────────────┬───────────────┬───────────────────────────────┬┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
│ char[8] │ char[8] │ __m128i │
└───────────────┴───────────────┴───────────────────────────────┴┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄
Now everything is correctly aligned - the arrays are at offsets that are multiple of 1 (the size of their elements) and the __m128i
object is at an offset that is a multiple of 16 - and the total size is 32 bytes.
The reason the compiler doesn't just do this rearrangement itself is because the standard specifies that later members of a class should have higher addresses:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object.
For POD (plain old data), the rules are typically:
The size of the structure is the value of S when the above is done.
Additionally:
Consider your TestClass3
:
char buf[8]
requires 8 bytes and alignment 1, so S is increased by 8 to 8, and A remains 1.__m128i vect
requires 16 bytes and alignment 16. First, S must be increased to 16 to give the correct alignment. Then A must be increased to 16. Then S must be increased by 16 to make space for vect
, so S is now 32.char buf2[8]
requires 8 bytes and alignment 1, so S is increased by 8 to 24, and A remains 16.So the size of TestClass3
is 32 bytes.
For elementary types (int
, double
, et cetera), the alignment requirements are implementation-defined and are usually largely determined by the hardware. On many processors, it is faster to load and store data when it has a certain alignment (usually when its address in memory is a multiple of its size). Beyond this, the rules above follow largely from logic; they put each member where it must be to satisfy alignment requirements without using more space than necessary.
1 I have worded this for a general case as using the least common multiple of alignment requirements. However, since alignment requirements are always powers of two, the least common multiple of any set of alignment requirements is the largest of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With