Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't C compilers rearrange struct members to eliminate alignment padding? [duplicate]

People also ask

Can compiler reorder struct members?

Since the rules are fixed in the language, the compiler is able to figure out how the members were reordered, and react accordingly. As mentioned above, it will always be possible to prevent reordering in the cases where you want complete control.

Can we simply re arrange the members of the structure to reduce padding?

Rearranging members to reduce paddingYou can reduce the size of each widget by rearranging the members to reduce the number of padding bytes.

Why are C structs padded?

In order to align the data in memory, one or more empty bytes (addresses) are inserted (or left empty) between memory addresses which are allocated for other structure members while memory allocation. This concept is called structure padding.

Why does compiler add padding?

The compiler will insert a padding byte after the char to ensure short int will have an address multiple of 2 (i.e. 2 byte aligned).


There are multiple reasons why the C compiler cannot automatically reorder the fields:

  • The C compiler doesn't know whether the struct represents the memory structure of objects beyond the current compilation unit (for example: a foreign library, a file on disc, network data, CPU page tables, ...). In such a case the binary structure of data is also defined in a place inaccessible to the compiler, so reordering the struct fields would create a data type that is inconsistent with the other definitions. For example, the header of a file in a ZIP file contains multiple misaligned 32-bit fields. Reordering the fields would make it impossible for C code to directly read or write the header (assuming the ZIP implementation would like to access the data directly):

    struct __attribute__((__packed__)) LocalFileHeader {
        uint32_t signature;
        uint16_t minVersion, flag, method, modTime, modDate;
        uint32_t crc32, compressedSize, uncompressedSize;
        uint16_t nameLength, extraLength;
    };
    

    The packed attribute prevents the compiler from aligning the fields according to their natural alignment, and it has no relation to the problem of field ordering. It would be possible to reorder the fields of LocalFileHeader so that the structure has both minimal size and has all fields aligned to their natural alignment. However, the compiler cannot choose to reorder the fields because it does not know that the struct is actually defined by the ZIP file specification.

  • C is an unsafe language. The C compiler doesn't know whether the data will be accessed via a different type than the one seen by the compiler, for example:

    struct S {
        char a;
        int b;
        char c;
    };
    
    struct S_head {
        char a;
    };
    
    struct S_ext {
        char a;
        int b;
        char c;
        int d;
        char e;
    };
    
    struct S s;
    struct S_head *head = (struct S_head*)&s;
    fn1(head);
    
    struct S_ext ext;
    struct S *sp = (struct S*)&ext;
    fn2(sp);
    

    This is a widely used low-level programming pattern, especially if the header contains the type ID of data located just beyond the header.

  • If a struct type is embedded in another struct type, it is impossible to inline the inner struct:

    struct S {
        char a;
        int b;
        char c, d, e;
    };
    
    struct T {
        char a;
        struct S s; // Cannot inline S into T, 's' has to be compact in memory
        char b;
    };
    

    This also means that moving some fields from S to a separate struct disables some optimizations:

    // Cannot fully optimize S
    struct BC { int b; char c; };
    struct S {
        char a;
        struct BC bc;
        char d, e;
    };
    
  • Because most C compilers are optimizing compilers, reordering struct fields would require new optimizations to be implemented. It is questionable whether those optimizations would be able to do better than what programmers are able to write. Designing data structures by hand is much less time consuming than other compiler tasks such as register allocation, function inlining, constant folding, transformation of a switch statement into binary search, etc. Thus the benefits to be gained by allowing the compiler to optimize data structures appear to be less tangible than traditional compiler optimizations.


C is designed and intended to make it possible to write non-portable hardware and format dependent code in a high level language. Rearrangement of structure contents behind the back of the programmer would destroy that ability.

Observe this actual code from NetBSD's ip.h:


/*
 * Structure of an internet header, naked of options.
 */
struct ip {
#if BYTE_ORDER == LITTLE_ENDIAN
    unsigned int ip_hl:4,       /* header length */
             ip_v:4;        /* version */
#endif
#if BYTE_ORDER == BIG_ENDIAN
    unsigned int ip_v:4,        /* version */
             ip_hl:4;       /* header length */
#endif
    u_int8_t  ip_tos;       /* type of service */
    u_int16_t ip_len;       /* total length */
    u_int16_t ip_id;        /* identification */
    u_int16_t ip_off;       /* fragment offset field */
    u_int8_t  ip_ttl;       /* time to live */
    u_int8_t  ip_p;         /* protocol */
    u_int16_t ip_sum;       /* checksum */
    struct    in_addr ip_src, ip_dst; /* source and dest address */
} __packed;

That structure is identical in layout to the header of an IP datagram. It is used to directly interpret blobs of memory blatted in by an ethernet controller as IP datagram headers. Imagine if the compiler arbitrarily re-arranged the contents out from under the author -- it would be a disaster.

And yes, it isn't precisely portable (and there's even a non-portable gcc directive given there via the __packed macro) but that's not the point. C is specifically designed to make it possible to write non-portable high level code for driving hardware. That's its function in life.


C [and C++] are regarded as systems programming languages so they provide low level access to the hardware, e.g., memory by means of pointers. Programmer can access a data chunk and cast it to a structure and access various members [easily].

Another example is a struct like the one below, which stores variable sized data.

struct {
  uint32_t data_size;
  uint8_t  data[1]; // this has to be the last member
} _vv_a;

Not being a member of WG14, I can't say anything definitive, but I have my own ideas:

  1. It would violate the principle of least surprise - there may be a damned good reason why I want to lay my elements out in a specific order, regardless of whether or not it's the most space-efficient, and I would not want the compiler to rearrange those elements;

  2. It has the potential to break a non-trivial amount of existing code - there's a lot of legacy code out there that relies on things like the address of the struct being the same as the address of the first member (saw a lot of classic MacOS code that made that assumption);

The C99 Rationale directly addresses the second point ("Existing code is important, existing implementations are not") and indirectly addresses the first ("Trust the programmer").


It would change the semantics of pointer operations to reorder the structure members. If you care about compact memory representation, it's your responsibility as a programmer to know your target architecture, and organize your structures accordingly.


If you were reading/writing binary data to/from C structures, reordering of the struct members would be a disaster. There would be no practical way to actually populate the structure from a buffer, for example.


Structs are used to represent physical hardware at the very lowest levels. As such the compiler cannot move things a round to suit at that level.

However it would not be unreasonable to have a #pragma that let the compiler re-arrange purely memory based structs that are only used internally to the program. However I don't know of such a beast (but that doesn't meant squat - I'm out of touch with C/C++)