Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"no base classes of the same type as the first non-static data member"

Tags:

c++

c++11

struct

I asked this a while ago on comp.std.c++ and got no reply.

I'm just going to quote my post there with little modification.


Is the last requirement of standard-layout classes, 9/6, necessary or useful?

A footnote explanation is provided:

This ensures that two subobjects that have the same class type and that belong to the same most-derived object are not allocated at the same address (5.10).

Taken alone, the footnote is incorrect. Two empty base classes with a common base class may produce two instances of the base class at the same address.

struct A {};
struct B : A {};
struct C : A {};
struct D : B, C {};

D d;
static_cast<A*>(static_cast<B*>(&d))
   == static_cast<A*>(static_cast<C*>(&d)); // allowed per 1.8/5

Taken in the context of 5.10, subobjects are only mentioned in the comparison requirements of pointers to members. Base subobjects are irrelevant. Moreover, it doesn't make sense to give special status to comparison between a (scalar) pointer to a member subobject and a pointer to a base subobject above that of comparison between pointers to base subobjects.

There wasn't such a restriction in C++03. Even if there is an ABI out there that requires every member to be allocated at a different address from any base of the same type, yet already allows the empty base class optimization on the above code, I think the ABI is buggy and the standard shouldn't capture this.

The language goes back to N2172 which suggests that multiple inheritance might cause trouble and need to be disallowed in standard-layout classes to ensure ABI compatibility; however, that was ultimately allowed and in that light the requirement doesn't make sense.


For reference, 1.8/5-6:

5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

6 Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses.

(footnote) Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference.

Additional notes:

10.1/8 refers to the same mystery content at 5.10, but it's also just an informative note.

[Note: … A base class subobject may be of zero size (Clause 9); however, two subobjects that have the same class type and that belong to the same most derived object must not be allocated at the same address (5.10). — end note ]

GCC appears to guarantee that empty base subobjects of the same type are given unique addresses. Example program and output. This seems sufficient to guarantee that objects of a given type are uniquely identified by address. That would be above and beyond the guarantees of the C++ object model, §1.8. Of course this is a good idea, but it doesn't seem required by the Standard. Likewise, the platform ABI can extend this guarantee to a class with the first member aliasing an empty base. The language sets minimum requirements for ABIs; an ABI can add a language feature, and other ABIs can follow suit, and the process of catch-up by the Standard is simply error-prone.

My question is whether the given requirement accomplishes anything in the context of the Standard, not whether it is useful to the user in concert with other ABI guarantees. Evidence that such a unique-address guarantee was intended, and only omitted by accident, would also make the requirement more meaningful.


To summarize the answer (or my conclusion, anyway):

The requirement does not theoretically ensure anything, as it's possible anyway to ensure that all objects of a given type have different addresses. When the address of an empty base class subobject conflicts with another object (either another base or a member), the compiler may simply assign it an arbitrary location within the structure. As the standard-layout rules only describe the locations of data members (possibly inherited), the locations of empty bases are still unspecified and perhaps incompatible between similar standard-layout classes. (The locations of non-empty bases are still unspecified as far as I've noticed, and then it's not clear what is meant by "first member" in that case, but they must be consistent in any case.)

In practice, the requirement allows implementations to continue using existing ABIs so long as they include the empty base class optimization. Existing compilers may disable the EBO when the requirement is violated, to avoid the address of the base coinciding with the address of the first member. If the Standard didn't restrict programs this way, libraries and programs would have to be recompiled with updated C++0x compilers… not worth it!

like image 341
Potatoswatter Avatar asked Oct 10 '10 22:10

Potatoswatter


3 Answers

One of the "special abilities" of a standard-layout class, is that you can reinterpret_cast a pointer to a standard-layout class object to the type of its first data member, and thus obtain a pointer to the first data member. [Edit: 9.2/19] Further, a standard-layout class with non-static data members is permitted to have empty bases. As you no doubt know, most implementations put base class sub-objects at the start of complete sub-objects. This combination of restrictions effectively mandates that the empty-base-class optimization is applied to all bases of standard-layout classes.

However, as other answers have explained, all base class sub-objects and member sub-objects that are part of the same complete object must be distinct, i.e., have different addresses if they are of the same type. Classes which violate your bullet point (that have a base class that is the same type as the first member) can't have the empty-base-class optimization fully applied, and thus can't be standard-layout classes if the base classes are positioned at the start of the complete object.

So I'm pretty sure this is what it's getting at - it's trying to say "if a class has base classes, and the empty-base-class optimization can't be applied, then the class is not standard-layout".

Edit: I'm being a bit slack with terminology here - it's possible to construct cases where the empty base class optimization can't be fully applied among the base classes (for example, in your struct D), but that doesn't matter because the base classes can still start at the beginning of the object, and conceptually "overlay" the data members, similar to a union. As you say, the base sub-objects get their addresses incremented if they (or a base) would otherwise overlay another base. While it's possible for the same thing to happen to bases of standard-layout cases (if they would overlap a data member of the same type), this would break existing ABIs, and add a special case for little gain.


You're saying that this is "forbidding" a possibility - it's not really forbidding, from my point of view, it's just not granting "standard-layout" status to types that didn't have that originally anyway (classes with bases were not PODs in C++03). So it's not forbidding such types, it's just saying that they don't get the special standard-layout treatment, which they weren't guaranteed in the first place.


Regarding my assertion that non-static data member subobjects and base subobjects are distinct, see if you find this convincing:

  • 5.9/2 (relational operators on pointers) makes it clear that no two data member subobjects (at least, with the same access specifier) have the same address as one another.
  • 5.3.1/1 (the unary operator*) says "the expression to which it is applied shall be a pointer to an object type [snip] and the result is an lvalue referring to the object to which the expression points." (emphasis added) This implies that there is at most one object of a given type at a particular address, at a given time.
  • 1.8/2 "A subobject can be a member subobject (9.2), a base class subobject (Clause 10), or an array element."... I think this implies that the categories are mutually exclusive (even if their storage overlaps). Other parts of the standard pretty strongly imply that base subobjects and member subobjects are distinct (e.g. 12.6.2).
  • Steve M's citation of 10.1/4 "For each distinct occurrence of a non-virtual base class in the class lattice of the most derived class, the most derived object (1.8) shall contain a corresponding distinct base class subobject of that type." - I believe this means that different bases must be at different addresses, or else they would not be "distinct" objects - there would be no way to distinguish them during their common lifetime.

I don't know how convincing this is, if you don't consider footnotes as normative or sufficiently indicating intention. For what it's worth, Stroustrup explains derived classes in "The C++ Programming Language" 12.2 in terms of member objects that have compiler-supported conversion from derived to base. Indeed, at the very end of this section, he explicitly says: "Using a class as a base is equivalent to declaring an (unnamed) object of that class. Consequently, a class must be defined in order to be used as a base (section 5.7)."


Also: it seems that GCC 4.5 does not bump up the base class in this specific situation, even though it does bump up the bases where you have repeated base classes (as you showed):

#include <assert.h>
#include <iostream>

struct E {};
struct D: E { E x ; };

int main()
{
   D d;
   std::cerr << "&d: " << (void*)(&d) << "\n";
   std::cerr << "&d.x: " << (void*)(&(d.x)) << "\n";
   std::cerr << "(E*)&d: " << (void*)(E*)(&d) << "\n";
   assert(reinterpret_cast<E *>(&d) == &d.x); //standard-layout requirement
}

Output (Linux x86-64, GCC 4.5.0):

&d: 0x7fffc76c9420
&d.x: 0x7fffc76c9421
(E*)&d: 0x7fffc76c9420
testLayout: testLayout.cpp:19: int main(): Assertion `reinterpret_cast(&d) == &d.x' failed.
Aborted
like image 83
Doug Avatar answered Nov 14 '22 22:11

Doug


If you put that equality expression inside of an assert(), you'll find that it fails. The A sub-objects are at separate locations. That's the correct behavior without specifying virtual:

struct B : virtual A {};
struct C : virtual A {};

With virtual, D is already not a standard-layout class, by the second rule. This is the case in C++ '98, '03, and '0x.

Edit to reflect comments:

Edit again: Nevermind, this isn't sufficient.

The point of the standard-layout class definition is to specify something that can be used with other languages. Let's use C as an example. In the general case, the following C++ class

struct X : public B{
  B b;
  int i;
};

would be equivalent to this C struct:

struct X{
  B base;
  B b;
  int i;
};

If B were an empty class and the empty-base optimization were applied, X would be equivalent to this in C:

struct X{
  B b;
  int i;
};

But, the C side of the interaction isn't going to know about that optimization. Instances of the C++ X and the C X would be incompatible. The restriction prevents this scenario.

like image 40
Steve M Avatar answered Nov 14 '22 23:11

Steve M


Two empty base classes with a common base class must produce two instances of the base class at the same address.

I don't think so. In fact a quick check with my copy of g++ indicates that I have two distinct A object addresses. I.e. your code above is not true.

The fact is, we must have two A objects by the way your classes are written. If two objects share the same address they are not two different objects in any meaningful sense. Thus, it is required that distinct addresses exist for the instances of the A object.

Suppose that A is defined like so:

class A
{
   static std::set<A*> instances;
   A() { instances.insert(this); }
   ~A() { instances.remove(this); }
}

If both copies of A are allowed to share an address this code will not function as it was intended. I believe that it is situations like these where the decision is made that we ought to have distinct addresess for different copies of A. Of course, it's the wierdness of situations like this that make me avoid multiple inheritance.

like image 40
Winston Ewert Avatar answered Nov 14 '22 23:11

Winston Ewert