Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PODs and inheritance in C++11. Does the address of the struct == address of the first member?

(I've edited this question to avoid distractions. There is one core question which would need to be cleared up before any other question would make sense. Apologies to anybody whose answer now seems less relevant.)

Let's set up a specific example:

struct Base {
    int i;
};

There are no virtual method, and there is no inheritance, and is generally a very dumb and simple object. Hence it's Plain Old Data (POD) and it falls back on a predictable layout. In particular:

Base b;
&b == reinterpret_cast<B*>&(b.i);

This is according to Wikipedia (which itself claims to reference the C++03 standard):

A pointer to a POD-struct object, suitably converted using a reinterpret cast, points to its initial member and vice versa, implying that there is no padding at the beginning of a POD-struct.[8]

Now let's consider inheritance:

struct Derived : public Base {
};

Again, there are no virtual methods, no virtual inheritance, and no multiple inheritance. Therefore this is POD also.

Question: Does this fact (Derived is POD in C++11) allow us to say that:

Derived d;
&d == reinterpret_cast<D*>&(d.i); // true on g++-4.6

If this is true, then the following would be well-defined:

Base *b = reinterpret_cast<Base*>(malloc(sizeof(Derived)));
free(b); // It will be freeing the same address, so this is OK

I'm not asking about new and delete here - it's easier to consider malloc and free. I'm just curious about the regulations about the layout of derived objects in simple cases like this, and where the initial non-static member of the base class is in a predictable location.

Is a Derived object supposed to be equivalent to:

struct Derived { // no inheritance
    Base b; // it just contains it instead
};

with no padding beforehand?

like image 919
Aaron McDaid Avatar asked Jan 14 '12 18:01

Aaron McDaid


3 Answers

You don't care about POD-ness, you care about standard-layout. Here's the definition, from the standard section 9 [class]:

A standard-layout class is a class that:

  • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
  • has no virtual functions (10.3) and no virtual base classes (10.1),
  • has the same access control (Clause 11) for all non-static data members,
  • has no non-standard-layout base classes,
  • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
  • has no base classes of the same type as the first non-static data member.

And the property you want is then guaranteed (section 9.2 [class.mem]):

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.

This is actually better than the old requirement, because the ability to reinterpret_cast isn't lost by adding non-trivial constructors and/or destructor.


Now let's move to your second question. The answer is not what you were hoping for.

Base *b = new Derived;
delete b;

is undefined behavior unless Base has a virtual destructor. See section 5.3.5 ([expr.delete])

In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined.


Your earlier snippet using malloc and free is mostly correct. This will work:

Base *b = new (malloc(sizeof(Derived))) Derived;
free(b);

because the value of pointer b is the same as the address returned from placement new, which is in turn the same address returned from malloc.

like image 82
Ben Voigt Avatar answered Sep 29 '22 12:09

Ben Voigt


Presumably your last bit of code is intended to say:

Base *b = new Derived;
delete b;  // delete b, not d.

In that case, the short answer is that it remains undefined behavior. The fact that the class or struct in question is POD, standard layout or trivially copyable doesn't really change anything.

Yes, you're passing the right address, and yes, you and I know that in this case the dtor is pretty much a nop -- nonetheless, the pointer you're passing to delete has a different static type than dynamic type, and the static type does not have a virtual dtor. The standard is quite clear that this gives undefined behavior.

From a practical viewpoint, you can probably get away with the UB if you really insist -- chances are pretty good that there won't be any harmful side effects from what you're doing, at least with most typical compilers. Beware, however, that even at best the code is extremely fragile so seemingly trivial changes could break everything -- and even switching to a compiler with really heavy type checking and such could do so as well.

As far as your argument goes, the situation's pretty simple: it basically means the committee probably could make this defined behavior if they wanted to. As far as I know, however, it's never been proposed, and even if it had it would probably be a very low priority item -- it doesn't really add much, enable new styles of programming, etc.

like image 27
Jerry Coffin Avatar answered Sep 29 '22 13:09

Jerry Coffin


This is meant as a supplement to Ben Voigt's answer', not a replacement.

You might think that this is all just a technicality. That the standard calling it 'undefined' is just a bit of semantic twaddle that has no real-world effects beyond allowing compiler writers to do silly things for no good reason. But this is not the case.

I could see desirable implementations in which:

Base *b = new Derived;
delete b;

Resulted in behavior that was quite bizarre. This is because storing the size of your allocated chunk of memory when it is known statically by the compiler is kind of silly. For example:

struct Base {
};

struct Derived {
   int an_int;
};

In this case, when delete Base is called, the compiler has every reason (because of the rule you quoted at the beginning of your question) to believe that the size of the data pointed at is 1, not 4. If it, for example, implements a version of operator new that has a separate array in which 1 byte entities are all densely packed, and a different array in which 4 byte entities are all densely packed, it will end up assuming the Base * points to somewhere in the 1-byte entity array when in fact it points somewhere in the 4-byte entity array, and making all kinds of interesting errors for this reason.

I really wish operator delete had been defined to also take a size, and the compiler passed in either the statically known size if operator delete was called on an object with a non-virtual destructor, or the known size of the actual object being pointed at if it were being called as a result of a virtual destructor. Though this would likely have other ill effects and maybe isn't such a good idea (like if there are cases in which operator delete is called without a destructor having been called). But it would make the problem painfully obvious.

like image 20
Omnifarious Avatar answered Sep 29 '22 12:09

Omnifarious