Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

size of struct in C different with variables rearranged

Tags:

c

padding

struct

I'm trying to see why the size of struct differs when I move the struct variables around, I know there are padding involved but it's not apparent what it is doing in the background

struct test1 {
    long y;
    int a;
    short int b;
    short int t;
}

sizeof(struct test1) = 16

struct test2 {
    long y;
    short int b;
    int a;
    short int t;
}

sizeof(struct test2) = 24

struct test3 {
    int a;
    long y;
    short int b;
    short int t;
}

sizeof(struct test3) = 24

I get that the size of test1 is 8 + (4+2+2) with no padding, But I dont get why test2 does not return the same result, 8 + (2+4+2) with no padding.

The third test3 we see that int takes 4bytes + 4 padding, long takes 8 bytes, short int takes 2bytes + 2bytes + 4 padding.

If test3 can make two short int to become contiguous, why doesn't test2 make short int, int, and short int become contiguous?

Also, does this imply that we should always make sure to reorder struct members to minimize padding?

So test1 is ALWAYS better to declare compared to test2 and test3?

EDIT: As a follow up,

struct test4 {
    char asdf[3];
    short int b;
};

sizeof(struct test4) = 6

Shouldn't short int be padded to 4 bytes as char of array size 3 is padded to 4 bytes?

like image 388
jimmyhuang0904 Avatar asked Dec 08 '22 11:12

jimmyhuang0904


1 Answers

What's going on in the background is alignment. Alignment is the requirement that a data type has an address divisible by some unit. If that alignment unit is the size of the type itself, then that is the strictest alignment that exists in C conforming implementations.

C compilers tend to ensure certain alignment in struct layouts, even when the requirement doesn't come from the target hardware.

If we have a long that is, say, 4 bytes, followed by a two-byte short, that short can be placed immediately after the long, because the 4 byte offset is more than sufficiently alignend for a two byte type. The offset after those two members is then 6. But then your compiler doesn't consider 6 to be a suitable alignment for a 4 byte int; it wants a multiple of 4. Two bytes of padding is inserted to move that int to offset 8.

Of course, the actual numbers are compiler-specific. You have to know the sizes of your types and the alignment requirements and rules.

Also, does this imply that we should always make sure to reorder struct members to minimize padding?

If minimal structure size is important in your application, then you have to order the members from most strictly aligned to least strictly aligned. If minimal structure size isn't important, then you don't have to care about this.

Other concerns may weigh in, like compatibility with an externally imposed layout.

Or incremental growth. If a publicly used structure (referenced by numerous instances of compiled code such as executables and dynamic libraries) is maintained over time across multiple versions, typically new members must be added only at the end. In that case, we don't get the optimal order for minimum size, even if we would like that.

Shouldn't short int be padded to 4 bytes as char of array size 3 is padded to 4 bytes?

No, because the one byte of padding after the char [4] array brings the offset to 4. That offset is more than sufficiently aligned for the placement of a two-byte short. Moreover, no padding is required after that short. Why? The offset after the short is 6. The most strictly aligned member of the structure is that short, with an alignment requirement of 2. 6 is divisible by 2.

Here is a situation in which alignment would be required after the two-byte short: struct { long x; short y; }. Say long is 4 bytes. Or, let's make it 8, doesn't matter. If we place the 2 byte short after the 8 byte long, we have a size of 10. That causes a problem if we declare an array a of this structure, because a[1].x will be at offset 10 from the base of the array: x is misaligned. The most strictly aligned structure member is x, with an alignment requirement of (say) 8, same as its size. Thus, for the sake of array alignment, the structure must be padded for its size to be to be divisible by 8. Thus, 6 bytes of padding at the end will be required to bring the size to 16.

Basically padding before a member is for its own alignment, and padding at the end of a structure is to ensure that all members are aligned in an array, and that is driven by the most strictly aligned member.

Alignment is a hard hardware requirement on some platforms! If, say, a four byte data type is accessed at an address not divisible by four, a CPU exception occurs. On some such platforms, the CPU exception can be handled by the operating system, which implements the misaligned access in software rather than passing a potentially fatal signal to the process. That access is then very expensive, probably requiring on the order of a few hundred instructions. I seem to recall that in the MIPS port of Linux, this is a per-process option: handling misaligned exceptions can be turned on for some non-portable programs (e.g. ones developed for Intel x86) that depend on it, yet not turned on for programs which only perform a misaligned access due to some corruption bug (e.g. uninitialized pointer aimed at valid memory by luck, but at a misaligned address).

On some platforms, the hardware handles misaligned access, but still at somewhat of a cost compared to aligned access. For instance, two memory accesses may have to be made instead of one.

C compilers tend to enforce alignment when allocating struct members and variables even for target machines that don't enforce alignment. This is likely done for various reasons like performance, and compatibility.

like image 185
Kaz Avatar answered Jan 06 '23 11:01

Kaz