Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Structure over flexible array member

I'm writing a C program (g++ compilable) that has to deal with a lot of different structures, all coming from a buffer with a predefined format. The format specifies which type of structure I should load. This may be solved using unions, but the hugh difference in sizes of the structures made me decide for a structure with a void * in it:

struct msg {
    int type;
    void * data; /* may be any of the 50 defined structures: @see type */
};

The problem with that is that I need 2 malloc calls, and 2 free. For me, function calls are expensive and malloc is expensive. From the users side, it would be great to simple free the msgs. So I changed the definition to:

struct msg {
    int type;
    uint8_t data[]; /* flexible array member */
};
...
struct msg_10 {
    uint32_t flags[4];
    ...
};

Whenever I need to deserialize a message, I do:

struct msg * deserialize_10(uint8_t * buffer, size_t length) {
    struct msg * msg = (struct msg *) malloc(offsetof(struct msg, data) + sizeof(struct msg_10));
    struct msg_10 * payload = (__typeof__(payload))msg->data;

    /* load payload */
    return msg;
}

And to get a member of that structure:

uint32_t msg10_flags(const struct msg * msg, int i)
{
    return ((struct msg_10 *)(msg->data))->flags[i];
}

With this change, gcc (and g++) issue a nice warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] message.

I think this is a common issue (but I found no answer here) on how to represent a family of messages in C, in some efficient manner.

I understand why the warning appeared, my questions are the following:

  1. Is it possible to implement something like this, without the warning or is it intrinsically flawed? (the or is not exclusive :P, and I'm almost convinced I should refactor)
  2. Would it be better to represent the messages using something like the following code?

    struct msg {
        int type;
    };
    ...
    struct msg_10 {
        struct msg; /* or int type; */
        uint32_t flags[4];
        ...
    };
    
  3. If yes, caveats? Can I always write and use the following?

    struct msg * deserialize_10(uint8_t * buffer, size_t length) {
        struct msg_10 * msg = (struct msg_10 *) malloc(sizeof(struct msg_10));
    
        /* load payload */
        return (struct msg *)msg;
    }
    
    uint32_t msg10_flags(const struct msg * msg, int i) {
        const struct msg_10 * msg10 = (const struct msg_10 *) msg;
        return msg10->flags[i];
    }
    
  4. Any other?

I forgot to say that this runs on low level systems and performance is a priority but, all in all, the real issue is how to handle this "multi-message" structure. I may refactor once, but changing the implementation of the deserialization of 50 message types...

like image 472
Mance Rayder Avatar asked Mar 05 '23 22:03

Mance Rayder


2 Answers

To dodge the strict aliasing, you can wrap your struct inside a union. With C11 you can use an anonymous struct to get rid of the extra level needed to access "flags":

typedef union
{
  struct
  {
    uint32_t flags[4];
  };  
  uint8_t bytes[ sizeof(uint32_t[4]) ];
} msg_10;

And now you can do msg_10* payload = (msg_10*)msg->data; and access payload without worrying about strict aliasing violations, since the union type includes a type (uint8_t[]) compatible with the effective type of the object.

Please note however, that the pointer returned by malloc has no effective type until you access it through a pointer to a certain type. So alternatively, you can make sure to access the data with the correct type after malloc, and that won't give a strict aliasing violation either. Something like

struct msg_10 * msg = malloc(sizeof(struct msg_10));
struct msg_10 dummy = *msg; 

Where dummy won't be used, it is just there to set the effective type.

like image 195
Lundin Avatar answered Mar 10 '23 12:03

Lundin


You can certainly build something like this using makros; message_header works as a parent-struct for all message-types. Being the first member of such structs they share the same address. Hence after creating msg(int) and casting it to a message_header you can free it simply by calling free on it. (C++ works somewhat the same btw)

Is this what you wanted?

struct message_header {
    int type;
};

#define msg(T) struct {struct message_header header; T data} 

struct message_header* init_msg_int(int a) {
    msg(int)* result = (msg(int)*)(malloc(sizeof(msg(int))));
    result->data = a;
    return (struct message_header*)result;
}

int get_msg_int(struct message_header* msg) {
    return ((msg(int)*)msg)->data;
}

void free_msg(struct message_header* msg) {
    free(msg);
}    
like image 40
Domso Avatar answered Mar 10 '23 10:03

Domso