Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access an object's storage through an aggregate

In "Lvalues and rvalues", [basic.lval] (3.10), the C++ standard contains a list of types such that it is valid to "access the stored value of an object" through a glvalue of such a type (paragraph 10). Specifically, it says:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,

  • [some unimportant details about CV and signed/unsigned]

  • an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

  • [some more stuff]

What exactly does the "aggregate" rule mean? How do I access an object's stored value through a glvalue of some general aggregate type?!

I'm picturing something like this:

int a = 10;                                      // my "stored value"

struct Foo { char x; float y; int z; bool w; };  // an aggregate

reinterpret_cast<Foo&>(a).y = 0;                 // ???

Doesn't the final cast produce a glvalue of "an aggregate type that includes the dynamic type of a", and thus make this valid?

like image 654
Kerrek SB Avatar asked Nov 28 '13 21:11

Kerrek SB


2 Answers

The intent of that list is not to provide you alternate methods to access an object, but rather as the footnote to the list indicates, to list all the ways an object might be aliased. Consider the following example:

struct foo
{
    char x; 
    float y; 
    int z; 
    bool w;
};

void func( foo &F, int &I, double &D )
{
    //...
}

What that list is saying is that accesses to F may also access the same underlying object as accesses to I. This could happen if you passed a reference to F.z in for I, like this:

func(F, F.z, D); 

On the other hand, you can safely assume no access to F accesses the same underlying object as D, because struct foo does not contain any members of type double.

That's true even if some joker does this:

union onion
{
    struct foo F;
    double D;
};

onion o; 
int i;

func( o.F, i, o.D );  // [class.union] (9.5) wants a word with you.  UB.

I'm not sure that the union was central to your question. But the part before the union example highlights why the aggregate rule exists.

Now let's consider your example: reinterpret_cast<Foo&>(a).y = 0; [expr.reinterpret.cast] (5.2.10), paragraph 11 has this to say:

An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_cast<T&&>(x)). The result refers to the same object as the source lvalue, but with a different type. The result is an lvalue for an lvalue reference type or an rvalue reference to function type and an xvalue for an rvalue reference to object type. No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called.71


71 This is sometimes referred to as a type pun.

In the context of your example, it's saying that if it's legal to convert a pointer-to-int to a pointer-to-Foo, then your reinterpret_cast<Foo&)(a) is legal and produces an lvalue. (Paragraph 1 tells us it will be an lvalue.) And, as I read it, that pointer conversion is itself OK, according to paragraph 7:

A pointer to an object can be explicitly converted to a pointer to a different object type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.

You have standard-layout types with compatible alignment constraints. So, what you have there is a type pun that yields an lvalue. The rule you listed does not on its own make it undefined.

So what might make it undefined? Well, for one, [class.mem] (9.2) paragraph 21 reminds us that a pointer to a standard layout struct object points to its initial member, and vice versa. And so, after your type pun, you're left with a reference to Foo, such that Foo's x is at the same location as a.

And... this is where my language lawyering peters out. I know in my gut that accessing Foo through that franken-reference is at best implementation defined or unspecified. I can't find where it's explicitly banished to the realm of undefined behavior.

But, I think I answered your original question: Why is the aggregate rule there? It gives you a very basic way to rule on potential aliases without further pointer analysis.

like image 72
Joe Z Avatar answered Jan 19 '23 02:01

Joe Z


The item of the clause just refers to the normal access to members of any aggregate (struct, class, or array) or union: You need to be able to access the stored values of objects without causing undefined behavior. The clause only states necessary conditions: at least one of the items has to be true. It doesn't state sufficient conditions, i.e., in addition to these conditions other conditions may need to hold, too.

like image 43
Dietmar Kühl Avatar answered Jan 19 '23 01:01

Dietmar Kühl