Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Union data structure alignment

I was working with some (of what I thought was) bad code that had a union like:

union my_msg_union
{
  struct message5;
  char buffer[256]
} message;

The buffer was filled with 256 bytes from comms. The struct is something like:

struct message5 {
 uint8 id;
 uint16 size;
 uint32 data;
 uint8 num_ids;
 uint16 ids[4];
} message5d

The same code was being compiled on heaps of architectures (8bit AVR, 16bit phillips, 32bit arm, 32bit x86 and amd64).

The problem I thought was the use of the union: The code just a blob of serial recieved bytes into the buffer, then reads the values out through the struct, without considering alignment/padding of the struct.

Sure enough, a quick look at sizeof(message5d) on different systems gave different results.

What surprised me however is that whenever the union with the char [] existed, all instances of all structs of that type, on all systems, dropped their padding/alignment, and made sure to be sequential bytes.

Is this a C standard or just something that compiler authors have put in to 'help'?

like image 968
Myforwik Avatar asked Oct 21 '22 19:10

Myforwik


1 Answers

This code demonstrates the opposite behaviour from the one you describe:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

struct message5
{
    uint8_t id;
    uint16_t size;
    uint32_t data;
    uint8_t num_ids;
    uint16_t ids[4];
};

#if !defined(NO_UNION)
union my_msg_union
{
    struct message5 msg;
    char buffer[256];
};
#endif /* NO_UNION */

struct data
{
    char const *name;
    size_t offset;
};

int main(void)
{
    struct data offsets[] =
    {
        { "message5.id", offsetof(struct message5, id) },
        { "message5.size", offsetof(struct message5, size) },
        { "message5.data", offsetof(struct message5, data) },
        { "message5.num_ids", offsetof(struct message5, num_ids) },
        { "message5.ids", offsetof(struct message5, ids) },
#if !defined(NO_UNION)
        { "my_msg_union.msg.id", offsetof(union my_msg_union, msg.id) },
        { "my_msg_union.msg.size", offsetof(union my_msg_union, msg.size) },
        { "my_msg_union.msg.data", offsetof(union my_msg_union, msg.data) },
        { "my_msg_union.msg.num_ids", offsetof(union my_msg_union, msg.num_ids) },
        { "my_msg_union.msg.ids", offsetof(union my_msg_union, msg.ids) },
#endif /* NO_UNION */
    };
    enum { NUM_OFFSETS = sizeof(offsets) / sizeof(offsets[0]) };

    for (size_t i = 0; i < NUM_OFFSETS; i++)
        printf("%-25s  %3zu\n", offsets[i].name, offsets[i].offset);
    return 0;
}

Sample output (GCC 4.8.2 on Mac OS X 10.9 Mavericks, 64-bit compilation):

message5.id                  0
message5.size                2
message5.data                4
message5.num_ids             8
message5.ids                10
my_msg_union.msg.id          0
my_msg_union.msg.size        2
my_msg_union.msg.data        4
my_msg_union.msg.num_ids     8
my_msg_union.msg.ids        10

The offsets within the union are the same as the offsets within the structure, as the C standard requires.

You would have to give a complete compiling counter-example based on the code above, and specify which compiler and platform you are compiling on to get your deviant answer — if indeed you can reproduce the deviant answer.

I note that I had to change uint8 etc to uint8_t, but I don't think that makes any difference. If it does, you need to specify which header you get the names like uint8 from.


Code updated to be compilable with or without union. Output when compiled with -DNO_UNION:

message5.id                  0
message5.size                2
message5.data                4
message5.num_ids             8
message5.ids                10
like image 110
Jonathan Leffler Avatar answered Nov 04 '22 20:11

Jonathan Leffler