Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struct type aliasing / tagged-union without union

For two (or more) structs: Base and Sub with a common first (unnamed) struct, is it safe to convert/cast from Base to Sub and vice versa?

struct Base{
    struct{
        int id;
        // ...
    };
    char data[]; // necessary?
}

struct Sub{
    struct{
        int id;
        // same '...'
    };
    // actual data
};

Are these functions guaranteed to be safe and technically correct? (Also: is the zero-length char data[] member necessary and useful?)

struct Base * subToBase(struct Sub * s){
    return (struct Base*)s;
}

struct Sub * baseToSub(struct Base * b){
    if(b->id != SUB_ID){
        return NULL;
    }

    return (struct Sub*)b;
}

Edit

I have no plans to nest any further than Base within Sub, but rather leave the possibility to add other sub-types (directly under Base) later without needing to change Base. My main concern is whether pointers to the structs can be safely converted back and forth between Base and any sub. References to the (C11) standard would be most appreciated.

Edit v2

Changed the wording slightly to discourage OOP/inheritance discussions. What I want is a tagged-union, without the union so it can be extended later. I have no plans for doing additional nesting. Sub-types that need other sub-types' functionality can do so explicitly, without doing any further nesting.


Context

For a script interpreter1 I have made a pseudo object-oriented tagged-union type system, without the union. It has an (abstract) generic base type Object with several (specific) sub-types, such as String, Number, List etc. Every type-struct has the following unnamed struct as the first member:

#define OBJHEAD struct{    \
    int id;                \
    int line;              \
    int column;            \
}

The id identifies the type of object, line and column should (also) be self-explanatory. A simplified implementation of various objects:

typedef struct Object{
    OBJHEAD;

    char data[]; // necessary?
} Object;

typedef struct Number{
    OBJHEAD;

    int value; // only int for simplicity
} Number;

typedef struct String{
    OBJHEAD;

    size_t length;
    char * string;
} String;

typedef struct List{
    OBJHEAD;

    size_t size;
    Object * elements; // may be any kind and mix of objects
} List;

Object * Number_toObject(Number * num){
    return (Object*)num;
}

Number * Number_fromObject(Object * obj){
    if(obj->type != TYPE_NUMBER){
        return NULL;
    }

    return (Number*)obj;
}

I know that the most elegant and technically correct way to do this would be to use an enum for the id and a union for the various sub-types. But I want the type-system to be extensible (through some form of type-registry) so that types can be added later without changing all the Object-related code.

A later/external addition could be:

typedef struct File{
    OBJHEAD;

    FILE * fp;
} File;

without needing to change Object.

Are these conversions guaranteed to be safe?

(As for the small macro-abuse: the OBJHEAD will of course be extensively documented so additional implementers will know what member-names not to use. The idea is not to hide the header, but to save pasting it every time.)

like image 532
Kninnug Avatar asked Sep 17 '15 19:09

Kninnug


3 Answers

Converting a pointer to one object type to a pointer to a different object type (via a cast, for instance) is permitted, but if the resulting pointer is not correctly aligned then behavior is undefined (C11 6.3.2.3/7). Depending on the members of Base and Sub and on implentation-dependent behavior, it is not necessarily the case that a Base * converted to a Sub * is correctly aligned. For example, given ...

struct Base{
    struct{
        int id;
    };
    char data[]; // necessary?
}

struct Sub{
    struct{
        int id;
    };
    long long int value;
};

... it may be that the implementation permits Base objects to be aligned on 32-bit boundaries but requires Sub objects to be aligned on 64-bit boundaries, or even on stricter ones.

None of this is affected by whether Base has a flexible array member.

It is a different question whether it is safe to dereference a pointer value of one type that was obtained by casting a pointer value of a different type. For one thing, C places rather few restrictions on how implementations choose to lay out structures: members must be laid out in the order they are declared, and there must not be any padding before the first one, but otherwise, implementations have free reign. To the best of my knowledge, in your case there is no requirement that the anonymous struct members of your two structures must be laid out the same way as each other if they have more than one member. (And if they have only one member then why use an anonumous struct?) It is also not safe to assume that Base.data starts at the same offset as the first element following the anonymous struct in Sub.

In practice, dereferencing the result of your subToBase() is probably ok, and you can certainly implement tests to verify that. Also, if you have a Base * that was obtained by conversion from a Sub *, then the result of converting it back, for instance via baseToSub(), is guaranteed to be the same as the original Sub * (C11 6.3.2.3/7 again). In that case, the conversion to Base * and back has no effect on the safety of dereferencing the the pointer as a Sub *.

On the other hand, though I'm having trouble finding a reference for it in the standard, I have to say that baseToSub() is very dangerous in the general context. If a Base * that does not actually point to a Sub is converted to Sub * (which in itself is permitted), then it is not safe to dereference that pointer to access members not shared with Base. In particular, given my declarations above, if the referenced object is in fact a Base, then Base.data being declared in no way prevents ((Sub *)really_a_Base_ptr)->value from producing undefined behavior.

To avoid all undefined and implementation-defined behavior, you want an approach that avoids casting and ensures consistent layout. @LoPiTaL's suggestion to embed a typed Base structure inside your Sub structures is a good approach in that regard.

like image 106
John Bollinger Avatar answered Oct 15 '22 01:10

John Bollinger


No it is not safe, at least not under all circumstances. If your compiler sees two pointers p and q that have different base type, it may always assume that they don't alias, or stated in other words it may always assume that *p and *q are different objects.

Your cast punches a hole in that assumption. That is if you have a function

double foo(struct A* p, struct B* q) {
   double b = q->field0;
   *p = (struct A*){ 0 };
   return b + q->field0;       // compiler may return 2*b
}

the optimizer is allowed to avoid the additional read from memory.

If you'd know that no function will ever see the same object through differently typed pointers, you would be safe. But such an assertion is not made easily, so you'd better avoid such hackery.

like image 36
Jens Gustedt Avatar answered Oct 14 '22 23:10

Jens Gustedt


It is correct, since it is guaranteed to have the same alignment on the first member of the struct, so you can cast from one struct to another.

Nevertheless, the common way to implement your behaviour is to "inherit" the base class:

//Base struct definition
typedef struct Base_{
    int id;
    // ...
    //char data[]; //This is not needed.
}Base;

//Subclass definition
typedef struct Sub_{
    Base base;  //Note: this is NOT a pointer
    // actual data
}Sub;

So now, you can cast a Sub struct into a Base struct, or just return the first member, which already is of type Base, so there is no need of casting anymore.

One word of caution: do not abuse MACROS. MACROS are nice and good for a lot of things, but abusing them may lead to difficult to read and maintain code. In this case, the macro is easily replaced with the base member.

One final word, your macro is error prone, since the member names are now hidden. On the end, you may be adding new members with the same name, and getting weird errors without knowing why.

When you further expand your hierarchy into sub-subclasses, you will end up having to write ALL the base classes MACRO, while if you use the "inherit" aproach, you will have to write only the direct base.

Neither of these solutions actually solve your problem: inheritance. The only real solution you would have (the preferred) would be to change to a trully OO language. Due to similarity to C, the best match would be C++, but could do any other language.

like image 27
LoPiTaL Avatar answered Oct 15 '22 01:10

LoPiTaL