Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't sizeof for a struct equal to the sum of sizeof of each member?

Why does the sizeof operator return a size larger for a structure than the total sizes of the structure's members?

like image 243
Kevin Avatar asked Sep 23 '08 04:09

Kevin


People also ask

Why isn't sizeof for a struct equal to the sum of sizeof of each member?

The sizeof for a struct is not always equal to the sum of sizeof of each individual member. This is because of the padding added by the compiler to avoid alignment issues. Padding is only added when a structure member is followed by a member with a larger size or at the end of the structure.

How is the size of a structure determined?

In C language, sizeof() operator is used to calculate the size of structure, variables, pointers or data types, data types could be pre-defined or user-defined. Using the sizeof() operator we can calculate the size of the structure straightforward to pass it as a parameter.

How many bytes does a struct take?

Contrary to what some of the other answers have said, on most systems, in the absence of a pragma or compiler option, the size of the structure will be at least 6 bytes and, on most 32-bit systems, 8 bytes. For 64-bit systems, the size could easily be 16 bytes.


2 Answers

This is because of padding added to satisfy alignment constraints. Data structure alignment impacts both performance and correctness of programs:

  • Mis-aligned access might be a hard error (often SIGBUS).
  • Mis-aligned access might be a soft error.
    • Either corrected in hardware, for a modest performance-degradation.
    • Or corrected by emulation in software, for a severe performance-degradation.
    • In addition, atomicity and other concurrency-guarantees might be broken, leading to subtle errors.

Here's an example using typical settings for an x86 processor (all used 32 and 64 bit modes):

struct X {     short s; /* 2 bytes */              /* 2 padding bytes */     int   i; /* 4 bytes */     char  c; /* 1 byte */              /* 3 padding bytes */ };  struct Y {     int   i; /* 4 bytes */     char  c; /* 1 byte */              /* 1 padding byte */     short s; /* 2 bytes */ };  struct Z {     int   i; /* 4 bytes */     short s; /* 2 bytes */     char  c; /* 1 byte */              /* 1 padding byte */ };  const int sizeX = sizeof(struct X); /* = 12 */ const int sizeY = sizeof(struct Y); /* = 8 */ const int sizeZ = sizeof(struct Z); /* = 8 */ 

One can minimize the size of structures by sorting members by alignment (sorting by size suffices for that in basic types) (like structure Z in the example above).

IMPORTANT NOTE: Both the C and C++ standards state that structure alignment is implementation-defined. Therefore each compiler may choose to align data differently, resulting in different and incompatible data layouts. For this reason, when dealing with libraries that will be used by different compilers, it is important to understand how the compilers align data. Some compilers have command-line settings and/or special #pragma statements to change the structure alignment settings.

like image 144
6 revs, 6 users 79% Avatar answered Sep 19 '22 19:09

6 revs, 6 users 79%


Packing and byte alignment, as described in the C FAQ here:

It's for alignment. Many processors can't access 2- and 4-byte quantities (e.g. ints and long ints) if they're crammed in every-which-way.

Suppose you have this structure:

struct {     char a[3];     short int b;     long int c;     char d[3]; }; 

Now, you might think that it ought to be possible to pack this structure into memory like this:

+-------+-------+-------+-------+ |           a           |   b   | +-------+-------+-------+-------+ |   b   |           c           | +-------+-------+-------+-------+ |   c   |           d           | +-------+-------+-------+-------+ 

But it's much, much easier on the processor if the compiler arranges it like this:

+-------+-------+-------+ |           a           | +-------+-------+-------+ |       b       | +-------+-------+-------+-------+ |               c               | +-------+-------+-------+-------+ |           d           | +-------+-------+-------+ 

In the packed version, notice how it's at least a little bit hard for you and me to see how the b and c fields wrap around? In a nutshell, it's hard for the processor, too. Therefore, most compilers will pad the structure (as if with extra, invisible fields) like this:

+-------+-------+-------+-------+ |           a           | pad1  | +-------+-------+-------+-------+ |       b       |     pad2      | +-------+-------+-------+-------+ |               c               | +-------+-------+-------+-------+ |           d           | pad3  | +-------+-------+-------+-------+ 
like image 42
EmmEff Avatar answered Sep 18 '22 19:09

EmmEff