Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OO Polymorphism in C, aliasing issues?

Me and a colleague are trying to achieve a simple polymorphic class hierarchy. We're working on an embedded system and are restricted to only using a C compiler. We have a basic design idea that compiles without warnings (-Wall -Wextra -fstrict-aliasing -pedantic) and runs fine under gcc 4.8.1.

However, we are a bit worried about aliasing issues as we do not fully understand when this becomes a problem.

In order to demonstrate we have written a toy example with an 'interface' IHello and two classes implementing this interface 'Cat' and 'Dog.

#include <stdio.h>

/* -------- IHello -------- */
struct IHello_;
typedef struct IHello_
{
    void (*SayHello)(const struct IHello_* self, const char* greeting);
} IHello;

/* Helper function */
void SayHello(const IHello* self, const char* greeting)
{
    self->SayHello(self, greeting);
}

/* -------- Cat -------- */
typedef struct Cat_
{
    IHello hello;
    const char* name;
    int age;
} Cat;

void Cat_SayHello(const IHello* self, const char* greeting)
{
    const Cat* cat = (const Cat*) self;
    printf("%s I am a cat! My name is %s and I am %d years old.\n",
           greeting,
           cat->name,
           cat->age);
}

Cat Cat_Create(const char* name, const int age)
{
    static const IHello catHello = { Cat_SayHello };
    Cat cat;

    cat.hello = catHello;
    cat.name = name;
    cat.age = age;

    return cat;
}

/* -------- Dog -------- */
typedef struct Dog_
{
    IHello hello;
    double weight;
    int age;
    const char* sound;
} Dog;

void Dog_SayHello(const IHello* self, const char* greeting)
{
    const Dog* dog = (const Dog*) self;
    printf("%s I am a dog! I can make this sound: %s I am %d years old and weigh %.1f kg.\n",
           greeting,
           dog->sound,
           dog->age,
           dog->weight);
}

Dog Dog_Create(const char* sound, const int age, const double weight)
{
    static const IHello dogHello = { Dog_SayHello };
    Dog dog;

    dog.hello = dogHello;
    dog.sound = sound;
    dog.age = age;
    dog.weight = weight;

    return dog;
}

/* Client code */
int main(void)
{
    const Cat cat = Cat_Create("Mittens", 5);
    const Dog dog = Dog_Create("Woof!", 4, 10.3);

    SayHello((IHello*) &cat, "Good day!");
    SayHello((IHello*) &dog, "Hi there!");

    return 0;
}

Output:

Good day! I am a cat! My name is Mittens and I am 5 years old.

Hi there! I am a dog! I can make this sound: Woof! I am 4 years old and weigh 10.3 kg.

We're pretty sure the the 'upcast' from Cat and Dog to IHello is safe since IHello is the first member of both these structs.

Our real concern is the 'downcast' from IHello to Cat and Dog respectively in the corresponding interface implementations of SayHello. Does this cause any strict aliasing issues? Is our code guaranteed to work by the C standard or are we simply lucky that this works with gcc?

Update

The solution that we eventually decide to use must be standard C and cannot rely on e.g. gcc extensions. The code must be able to compile and run on different processors using various (proprietary) compilers.

The intention with this 'pattern' is that client code shall receive pointers to IHello and thus only be able to call functions in the interface. However, these calls must behave differently depending on which implementation of IHello that was received. In short, we want identical behaviour to the OOP concept of interfaces and classes implementing this interface.

We are aware of the fact that the code only works if the IHello interface struct is placed as the first member of the structs which implement the interface. This is a limitation that we are willing to accept.

According to: Does accessing the first field of a struct via a C cast violate strict aliasing?

§6.7.2.1/13:

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

The aliasing rule reads as follows (§6.5/7):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

According to the fifth bullet above and the fact that structures contain no padding at the top we are fairly sure that 'upcasting' a derived struct that implements the interface to a pointer to the interface is safe, i.e.

Cat cat;
const IHello* catPtr = (const IHello*) &cat; /* Upcast */

/* Inside client code */
void Greet(const IHello* interface, const char* greeting)
{
    /* Users do not need to know whether interface points to a Cat or Dog. */
    interface->SayHello(interface, greeting); /* Dereferencing should be safe */
}

The big question is whether the 'downcast' used in the implementation of the interface function(s) is safe. As seen above:

void Cat_SayHello(const IHello* hello, const char* greeting)
{
    /* Is the following statement safe if we know for
     * a fact that hello points to a Cat?
     * Does it violate strict aliasing rules? */
    const Cat* cat = (const Cat*) hello;
    /* Access internal state in Cat */
}

Also note that changing the signature of the implementation functions to

Cat_SayHello(const Cat* cat, const char* greeting);
Dog_SayHello(const Dog* dog, const char* greeting);

and commenting out the 'downcast' also compiles and runs fine. However, this generates a compiler warning for function signature mismatch.

like image 272
JonatanE Avatar asked Jul 17 '15 13:07

JonatanE


2 Answers

I've been doing objects in c for many years doing exactly the kind of composition you are doing here. I'm going to recommend you not do the simple cast you are describing, but to justify that I need an example. For instance a timer callback mechanism used with a layered implementation:

typedef struct MSecTimer_struct MSecTimer;
struct MSecTimer_struct {
     DoubleLinkedListNode       m_list;
     void                       (*m_expiry)(MSecTimer *);
     unsigned int               m_ticks;
     unsigned int               m_needsClear: 1;
     unsigned int               m_user: 7;
};

When one of these timers expires the managing system calls the m_expiry function and passes in the pointer to the object:

timer->m_expiry(timer);

Then take a base object that does something amazing:

typedef struct BaseDoer_struct BaseDoer;
struct BaseDoer_struct
{
     DebugID      m_id;
     void         (*v_beAmazing)(BaseDoer *);  //object's "virtual" function
};

//BaseDoer's version of BaseDoer's 'virtual' beAmazing function
void BaseDoer_v_BaseDoer_beAmazing( BaseDoer *self )
{
    printf("Basically, I'm amazing\n");
}

My naming system has a purpose here, but that's not really the focus. We can see a variety of object oriented function calls that might be needed:

typedef struct DelayDoer_struct DelayDoer;
struct DelayDoer_struct {
     BaseDoer     m_baseDoer;
     MSecTimer    m_delayTimer;
};

//DelayDoer's version of BaseDoer's 'virtual' beAmazing function
void DelayDoer_v_BaseDoer_beAmazing( BaseDoer *base_self )
{
     //instead of just casting, have the compiler do something smarter
     DelayDoer *self = GetObjectFromMember(DelayDoer,m_baseDoer,base_self);

     MSecTimer_start(m_delayTimer,1000);  //make them wait for it
}

//DelayDoer::DelayTimer's version of MSecTimer's 'virtual' expiry function
void DelayDoer_DelayTimer_v_MSecTimer_expiry( MSecTimer *timer_self )
{
    DelayDoer *self = GetObjectFromMember(DelayDoer,m_delayTimer,timer_self);
    BaseDoer_v_BaseDoer_beAmazing(&self->m_baseDoer);
}

I've been using the same macro for GetObjectFromMember since around 1990, and somewhere along the line the Linux kernel created the same macro and called it container_of (the parameters are in a different order though):

  #define GetObjectFromMember(ObjectType,MemberName,MemberPointer) \
              ((ObjectType *)(((char *)MemberPointer) - ((char *)(&(((ObjectType *)0)->MemberName)))))

which relies on (technically) undefined behavior (dereferencing a NULL object), but is portable to every old (and new) c compiler I've ever tested. The newer version requires the offsetof macro, which is now part of the standard (as of C89 apparently):

#define container_of(ptr, type, member) ({ \
            const typeof( ((type *)0)->member ) *__mptr = (ptr); 
            (type *)( (char *)__mptr - offsetof(type,member) );})

I, of course, prefer my name, but whatever. Using this method makes your code not rely on putting the base object first, and also makes the second use case possible, which I find very useful in practice. All of the aliasing compiler issues are managed within the macro (casting through the char * I think, but I'm not really a standards lawyer).

like image 147
Speed8ump Avatar answered Oct 23 '22 04:10

Speed8ump


From the section of the standard that you quoted:

A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa

It's definitely safe to convert a pointer like cat->hello into a Cat pointer, and similarly for dog->hello, so the casts in your SayHello functions should be fine.

At the call site, you're doing the opposite: converting a pointer to a structure into a pointer to the first element. That's also guaranteed to work.

like image 2
Mark Bessey Avatar answered Oct 23 '22 03:10

Mark Bessey