Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does C++ guarantee identical binary layout for "trivial" structs with a single trivial member?

We have some strictly typed integer types in our project:

struct FooIdentifier {
  int raw_id; // the only data member

  // ... more shenanigans, but it stays a "trivial" type.
};

struct BarIdentifier {
  int raw_id; // the only data member

  // ... more shenanigans, but it stays a "trivial" type.
};

Basically something as proposed here or similar to things used in a Unit Library.

These structs basically are integers, except to the type system.

My question here now is: Does the C++ language guarantee that these types are layed out 100% equivalent in memory as a regular int would be?

Note: Since I can statically check whether the types have the same size (i.e. no padding), I'm really only interested in the no-surprising-padding case. I should've added this note from the beginning

// Precodition. If platform would yield false here, I'm not interested in the result.
static_assert(sizeof(int) == sizeof(ID_t)); 

That is, does the following hold from a C++ Standard POV:

int integer_array[42] = {}; // zero init
ID_t id_array[42] = {}; // zero init

static_assert(sizeof(int) == sizeof(ID_t)); // Precodition. If platform would yield false here, I'm not interested in the result.

const char* const pIntArrMem = static_cast<const char*>(static_cast<const void*>(integer_array));
const char* const pIdArrMem = static_cast<const char*>(static_cast<const void*>(id_array));
assert(0 == memcmp(pIntArrMem, pIdArrMem, sizeof(int))); // Always ???
like image 369
Martin Ba Avatar asked Mar 12 '21 09:03

Martin Ba


3 Answers

TL;DR No, the standard seems to not guarantee it (as far as I can tell). You technically have to rely on having a sane ABI.

You may need to give up supporting ds9k.


The standard doesn't explicitly guarantee much about layout. At best we can make some reasonable assumptions about what practical implementations could do based on guarantees that we do have.

[basic.compound]

Two objects a and b are pointer-interconvertible if:

  • ...
  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast.

From this, we transitively know that there practically cannot be padding in the standard layout class before the first member.

[expr.sizeof]

... When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. ... When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.

This implies that neither integer_array nor id_array nor any array have padding before (nor between nor after) elements.

Given the lack of padding before the int sub object, your second assert would be a reasonable assumption unless an object could have one representation in one context and another representation in another context (free vs sub object, or sub object of different enclosing type). For example, big endian in one and little endian in another. I cannot find standard disallowing that, but I also cannot imagine how such implementation could work in practice, given that compiler cannot practically always know whether a particular glvalue is a sub object (and within which enclosing object) or not.

Given the above assumptions, the first assert boils down to "could the standard layout class have padding after the only member? Actually, this is entirely possible if there is alignas or some layout affecting language extension involved, but can we assume the negative if that is not the case? Standard doesn't say much, and I don't think this would even be impossible for a language implementation in practice to add some padding - just not very useful.

What little standards says about object representation:

[basic.types.general]

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 35

35) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.


Little bit regarding whether FooIdentifier and BarIdentifier are guaranteed to have same representation between each other.

[class.mem.general]

The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.

Two standard-layout struct ([class.prop]) types are layout-compatible classes if their common initial sequence comprises all members and bit-fields of both classes ([basic.types]).

[basic.compound]

... Pointers to layout-compatible types shall have the same value representation and alignment requirements

The classes are layout-compatible, which sounds promising as a description, but has little effect on rules of the language.

like image 165
eerorika Avatar answered Nov 19 '22 21:11

eerorika


Challenging eerorika's answer, I believe you are guaranteed binary compatibility. I'll reference the C++11 spec for this.

Key pieces: [class/7] This defines a standard-layout class. It's pretty clear we all agree that these are standard layout.

[intro.object/5] and [intro.object/6]

An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies.

This bounds the shapes that a standard-layout object can have, and specifies what we can call "the address of" an object.

[class.mem/20]

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

This says that we can at least convert a ID_t* to an int* via reinterpret cast.

Now, you assert that sizeof(ID_t) == sizeof(int). This is good news because it limits your options. int* someIdAsInt = reinterpret_cast<int*>(&someId) is guaranteed to succeed, and it will point at the first member, per class.mem. So the question is, what are the possible addresses that can be returned? Obviously, there is only one address which can possibly be the first byte of sizeof(int) bytes, which is, of course, the address of someId.

So we can be certain that &someId and someIdAsInt refer to the same address. And, in particular, someIdAsInt must point at the initial member per class.mem.

If I were to do *someIdAsInt = 43, the result must be the same as if I did someId.raw_id = 43, because someIdAsInt points at someId.raw_id. This statement must be true no matter what I do with this pointer to obscure it.

This says that *someIdAsInt and someId either must have the same layout (permitting the assignment), or the compiler must track the value of someIdAsInt, treating it different than a normal int*. This is why I depart from eerorika's answer. This information could not be handled in the type system with type tagging(it would force the compiler to be able to track tags, even if you did brutal things like pass int* between threads). So any information tagging must be baked into the bytes forming the value of the int*. The C++ spec does not say anything about the format of a pointer's value.

However, there are limits to how different int* can be, which are generally speaking, undisputed. The key one is that I can use std::memcpy to copy the bytes of one int into another, and the resulting integer must be the same value. To the best of my knowledge, this is not actually written into the spec, but it is accepted by (basically?) all programmers as a common law rule of C and C++. Indeed this sort of thing is further emphasized by the inclusion of std::bit_cast in C++20. To have two integer formats which cannot be distinguished by their bytes would break all sorts of things.

So, if you accept this common law ruling in a language-lawyer argument, then the layout of your ID_t must be identical to the layout of int if sizeof(ID_t) == sizeof(int). If that common law ruling is not accepted then... well... I'd just say some soul searching is in order =D

Note that this does not mean that you can safely go the other way. If you have an int array, you cannot cast it to ID_t* and then access those. That would be a violation of strict aliasing, as there was never an ID_t in that memory address in the first place. However, because they are identical layouts, using std::memcpy or std::bit_cast to convert to an ID_t with an equivalent bit pattern would still be fair game.

like image 44
Cort Ammon Avatar answered Nov 19 '22 20:11

Cort Ammon


No, it is not guaranteed. Simple counter-example:

#include <cstdio>
struct S {
    int s;
} __attribute__ ((aligned (8)));
int main() { printf("%d %d\n", sizeof(S), sizeof(int)); }

Prints 8 and 4 on my machine. __attribute__ is non-standard syntax but there is no guarantee that gcc won't change to eight-byte-alignment by default in the future.

Edit: Given the precondition that the struct and the int always is the same size, then identical binary layout is indeed guaranteed. At least in any implementation that is the least sensible.

like image 5
Björn Lindqvist Avatar answered Nov 19 '22 19:11

Björn Lindqvist