Why are the values returned by sizeof() compiler dependent?

Question

struct A
{
    char c;
    double d;
} a;

In mingw32-gcc.exe: sizeof a = 16
In gcc 4.6.3(ubuntu): sizeof a = 12

Why they are different? I think it should be 16, does gcc4.6.3 do some optimizations?

2 revs, 2 users 63% · Accepted Answer

Compilers might perform data structure alignment for a target architecture if needed. It might done purely to improve runtime performance of the application, or in some cases is required by the processor (i.e. the program will not work if data is not aligned).

For example, most (but not all) SSE2 instructions require data to aligned on 16-byte boundary. To put it simply, everything in computer memory has an address. Let's say we have a simple array of doubles, like this:

double data[256];

In order to use SSE2 instructions that require 16-byte alignment, one must make sure that address of &data[0] is multiple of 16.

The alignment requirements differ from one architecture to another. On x86_64, it is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:

Align 8-bit data at any address
Align 16-bit data to be contained within an aligned four-byte word
Align 32-bit data so that its base address is a multiple of four
Align 64-bit data so that its base address is a multiple of eight
Align 80-bit data so that its base address is a multiple of sixteen
Align 128-bit data so that its base address is a multiple of sixteen

Interestingly enough, most x86_64 CPUs would work with both aligned and non-aligned data. However, if the data is not aligned properly, CPU executes code significantly slower.

When compiler takes this into consideration, it may align members of the structure implicitly and that would affect its size. For example, let's say we have a structure like this:

struct A {
    char a;
    int b;
};

Assuming x86_64, the size of int is 32-bit or 4 bytes. Therefore, it is recommended to always make address of b a multiple of 4. But because a field size is only 1 byte, this won't be possible. Therefore, compiler would add 3 bytes of padding in between a and b implicitly:

struct A {
    char a;
    char __pad0[3]; /* This would be added by compiler,
                       without any field names - __pad0 is for
                       demonstration purposes */
    int b;
};

How compiler does it depends not only on compiler and architecture, but on compiler settings (flags) you pass to the compiler. This behavior can also be affected using special language constructs. For example, one can ask the compiler to not perform any padding with packed attribute like this:

struct A {
    char a;
    int b;
} __attribute__((packed));

In your case, mingw32-gcc.exe has simply added 7 bytes between c and d to align d on 8 byte boundary. Whereas gcc 4.6.3 on Ubuntu has added only 3 to align d on 4 byte boundary.

Unless you are performing some optimizations, trying to use special extended instruction set, or have specific requirements for your data structures, I'd recommend you do not depend on specific compiler behavior and always assume that not only your structure might get padded, it might get padded differently between architectures, compilers and/or different compiler versions. Otherwise you'd need to semi-manually ensure data alignment and structure sizes using compiler attributes and settings, and make sure it all works across all compilers and platforms you are targeting using unit tests or maybe even static assertions.

For more information, please check out:

Data Alignment article on Wikipedia
Data Alignment when Migrating to 64-Bit Intel® Architecture
GCC Variable Attributes

Hope it helps. Good Luck!

How to minimize padding:

It is always good to have all your struct members properly aligned and at the same time keep your structure size reasonable. Consider these 2 struct variants with members rearanged (from now on assume sizeof char, short, int, long, long long to be 1, 2, 4, 4, 8 respectively):

struct A
{
    char a;
    short b;
    char c;
    int d;
};

struct B
{
    char a;
    char c;
    short b;
    int d;
};

Both structures are supposed to keep the same data but while sizeof(struct A) will be 12 bytes, sizeof(struct B) will be 8 due to well-though-out member order which eliminated implicit padding:

struct A
{
    char a;
    char __pad0[1]; // implicit compiler padding
    short b;
    char c;
    char __pad1[3]; // implicit compiler padding
    int d;
};

struct B // no implicit padding
{
    char a;
    char c;
    short b;
    int d;
};

Rearranging struct members may be error prone with increase of member count. To make it less error prone - put longest at the beginning and shortest at the end:

struct B // no implicit padding
{
    int d;
    short b;
    char a;
    char c;
};

Implicit padding at the end of stuct:

Depending on your compiler, settings, platform etc used you may notice that compiler adds padding not only before struct members but also at the end (ie. after the last member). Below structure:

struct abcd
{
    long long a;
    char b;
};

may occupy 12 or 16 bytes (worst compilers will allow it to be 9 bytes). This padding may be easily overlooked but is very important if your structure will be array alement. It will ensure your a member in subsequent array cells/elements will be properly aligned too.

Final and random thoughts:

It will never hurt (and may actually save) you if - when working with structs - you follow these advices:

Do not rely on compiler to interleave your struct members with proper padding.
Make sure your struct (if outside array) is aligned to boundary required by its longest member.
Make sure you arrange your struct members so that longest are placed first and last member is shortest.
Make sure you explicitly padd your struct (if needed) so that if you create array of structs, every structure member has proper alignment.
Make sure that arrays of your structs are properly aligned too as although your struct may require 8 byte alignment, your compiler may align your array at 4 byte boundary.

Bathsheba · Answer

The values returned by sizeof for structs are not mandated by any C standard. It's up to the compiler and machine architecture.

For example, it can be optimal to align data members on 4 byte boundaries: in which case the effective packed size of char c will be 4 bytes.

Why are the values returned by sizeof() compiler dependent?

Tags:

c++

c

sizeof

tczf

2 Answers

2 revs, 2 users 63%

Bathsheba

Recent Activity

Donate For Us

Why are the values returned by sizeof() compiler dependent?

Tags:

c++

c

sizeof

tczf

2 Answers

2 revs, 2 users 63%

Bathsheba

Related questions

Recent Activity

Donate For Us