Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ value representation of non-trivially-copyable types

The current draft of the C++ standard (march 2019) has the following paragraph ([basic.types] p.4) (emphasis mine):

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

Why is the highlighted sentence limited to trivially copyable types? Is it because some bits from the value representation of a non-trivially-copyable object may be outside its object representation? This answer, as well as this one imply this.

However, in the answers linked above, the conceptual value of the object is based on semantics that are introduced by the user. In the example from the first linked answer:

class some_other_type
{
    int a;
    std::string s;
};

the user decides that the value of an object of type some_other_type includes the characters belonging to string s.

I tried to think of examples where the fact that some bits of an object's (that is not trivially copyable) value representation are outside its object representation is implicit (the implementation has to do this, it is not arbitrarily decided by the user).

One example that I came up with is the fact that the value representation of a base class subobject with virtual methods may include bits from the object representation of the complete object to which it belongs, because the base class subobject may behave differently (may "have a different value") compared to the situation in which it would be a complete object itself.

Another example that I though of is the fact that a vtable may also be part of the value representation of the object whose vtable pointer points to it.

Are these examples correct? Are there other examples?

Was the highlighted sentence introduced by the standard committee because of the fact that the semantic "value" of an object may be decided by the user (as in the two linked answers), or because of the fact that implementations may decide (or may be forced) to do this, or both?

Thank you.

like image 326
user42768 Avatar asked Mar 18 '19 16:03

user42768


1 Answers

In my interpretation, the focus of the sentence you highlighted is this part:

For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

Essentially, [basic.types]#4 of the standard says "each object has a set of bits O that are its object representation and a set of bits that are its value representation V. The set P = O without V are the padding bits. For trivially copyable types, V is a subset of O". The latter is important because it means that copying around the O set of bits also safely copies around the V for trivially copyable types, thus the value is preserved. How you define V for other types is of no concern here (set it to the entire abstract machine if you want).


To answer the revised question asked in the comments:

why can't an implementation tell what 1110000100010001111 means if it were the object representation of a non-trivially-copyable object? Is it because there are some other bits (outside of this object representation) that help decide what value the object has?

Let's take std::string as an example. It is not trivially copyable because it has to deal with memory management.

If two std::string objects had the same bit pattern, would they mean the same thing?

No. There is at least one implementation that indicates small string optimization by having its buffer pointer point into itself (gcc). Upon destruction, the buffer is deallocated if (and only if) it is not pointing to that exact location.

Clearly, two std::string objects residing in different locations would have to (in this implementation) represent the same (small) string value with different bit patterns (the buffer pointers would have to be different). And more importantly, the same bit pattern in two objects can mean very different things - it might indicate SSO in one case but not the other.

As you can see, there is additional information participating in the value representation of each std::string here: Its location in memory (i.e. the value of this). How exactly that is represented in terms of bits is not further specified by the standard.

like image 168
Max Langhof Avatar answered Oct 29 '22 17:10

Max Langhof