Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What prevents overlapping of adjacent members in classes?

Consider the following three structs:

class blub {
    int i;
    char c;

    blub(const blub&) {}
};

class blob {
    char s;

    blob(const blob&) {}
};

struct bla {
    blub b0;
    blob b1;
};

On typical platforms where int is 4 bytes, the sizes, alignments and total padding1 are as follows:

  struct   size   alignment   padding  
 -------- ------ ----------- --------- 
  blub        8           4         3  
  blob        1           1         0  
  bla        12           4         6  

There is no overlap between the storage of the blub and blob members, even though the size 1 blob could in principle "fit" in the padding of blub.

C++20 introduces the no_unique_address attribute, which allows adjacent empty members to share the same address. It also explicitly allows the scenario described above of using padding of one member to store another. From cppreference (emphasis mine):

Indicates that this data member need not have an address distinct from all other non-static data members of its class. This means that if the member has an empty type (e.g. stateless Allocator), the compiler may optimise it to occupy no space, just like if it were an empty base. If the member is not empty, any tail padding in it may be also reused to store other data members.

Indeed, if we use this attribute on blub b0, the size of bla drops to 8, so the blob is indeed stored in the blub as seen on godbolt.

Finally, we get to my question:

What text in the standards (C++11 through C++20) prevents this overlapping without no_unique_address, for objects that are not trivially copyable?

I need to exclude trivially copyable (TC) objects from the above, because for TC objects, it is allowed to std::memcpy from one object to another, including member subobjects, and if the storage was overlapped this would break (because all or part of the storage for the adjacent member would be overwritten)2.


1 We calculate padding simply as the difference between the structure size and the size of all its constituent members, recursively.

2 This is why I have copy constructors defined: to make blub and blob not trivially copyable.

like image 893
BeeOnRope Avatar asked Jan 22 '20 20:01

BeeOnRope


1 Answers

The standard is awfully quiet when talking about the memory model and not very explicit about some of the terms it uses. But I think I found a working argumentation (that may be a bit weak)

First, let's find out what is even part of an object. [basic.types]/4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits.

So the object representation of b0 consists of sizeof(blub) unsigned char objects, so 8 bytes. The padding bits are part of the object.

No object can occupy the space of another if it is not a nested within it [basic.life]/1.5:

The lifetime of an object o of type T ends when:

[...]

(1.5) the storage which the object occupies is released, or is reused by an object that is not nested within o ([intro.object]).

So the lifetime of b0 would end, when the storage that is occupied by it would be reused by another object, i.e. b1. I haven't checked that but I think the standard mandates that the subobject of an object that is alive should also be alive (and I couldn't imagine how this should work differently).

So the storage that b0 occupies may not be used by b1. I have found no definition of "occupy" in the standard, but I think a reasonable interpretation would be "part of the object representation". In the quote descriping object representation, the words "take up" are used1. Here, this would be 8 bytes, so bla needs at least one more for b1.

Especially for subobjects (so among others non-static data members) there is also the stipulation [intro.object]/9 (but this was added with C++20, thx @BeeOnRope)

Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage.

(emphasis mine) Here again, we have the problem that "occupies" is not defined and again I would argue to take the bytes in the object representation. Note that there is a footnote to this [basic.memobj]/footnote 29

Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference ([intro.execution]).

Which may allow the compiler to break this if it can prove that there is no observable side-effect. I would think that this is pretty complicated for such a fundamental thing like object layout. Maybe that is why this optimization is only taken when the user provides the info that there is no reason to have disjoint objects by adding the [no_unique_address] attribute.

tl;dr: Padding maybe part of the object and members have to be disjoint.


1 I could not resist adding a reference that occupy may mean to take up: Webster’s Revised Unabridged Dictionary, G. & C. Merriam, 1913 (emphasis mine)

  1. To hold, or fill, the dimensions of; to take up the room or space of; to cover or fill; as, the camp occupies five acres of ground. Sir J. Herschel.

What standard crawl would be complete without a dictionary crawl?

like image 197
n314159 Avatar answered Oct 15 '22 20:10

n314159