Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Common initial sequence and alignment

While thinking of a counter-example for this question, I came up with:

struct A
{
    alignas(2) char byte;
};

But if that's legal and standard-layout, is it layout-compatible to this struct B?

struct B
{
    char byte;
};

Furthermore, if we have

struct A
{
    alignas(2) char x;
    alignas(4) char y;
};
// possible alignment, - is padding
// 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
//  x  -  -  -  y  -  -  -  x  -  -  -  y  -  -  -

struct B
{
    char x;
    char y;
}; // no padding required

union U
{
    A a;
    B b;
} u;

Is there a common initial sequence for A and B? If so, does it include A::y & B::y? I.e., may we write the following w/o invoking UB?

u.a.y = 42;
std::cout << u.b.y;

(answers for C++1y / "fixed C++11" also welcome)


  • See [basic.align] for alignment and [dcl.align] for the alignment-specifier.

  • [basic.types]/11 says for fundamental types "If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types." (an underlying question is whether A::byte and B::byte have layout-compatible types)

  • [class.mem]/16 "Two standard-layout struct types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types."

  • [class.mem]/18 "Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members."

  • [class.mem]/18 "If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them."

Of course, on a language-lawyer level, another question is what it means that the inspection of the common initial sequence is "permitted". I guess some other paragraph might make the above u.b.x undefined behaviour (reading from an uninitialized object).

like image 441
dyp Avatar asked Feb 01 '14 15:02

dyp


2 Answers

It looks like a hole in the standard. The responsible thing would be to file a defect report.

Several things, though:

  • Your first example doesn't really demonstrate a problem. Adding a short after the char would also have the effect of aligning the char to a 2-byte boundary, without changing the common subsequence.
  • alignas is not C++-only; it was added simultaneously to C11. Since the standard-layout property is a cross-language compatibility facility, it is probably preferable to require corresponding alignment specifiers to match than to disqualify a class with a nonstatic member alignment-specifier.
  • There would be no problem if the member alignment specifiers appertained to the types of the members. Other problems may result from the lack of adjustment to types, for example a function parameter ret fn( alignas(4) char ) may need to be mangled for the ABI to process it correctly, but the language might not provide for such adjustment.
like image 151
Potatoswatter Avatar answered Sep 21 '22 14:09

Potatoswatter


I may not speak for C++11 standard, but I am a firmware/microchip programmer and have had to use such features that exist for a long time (pragma pack, alignment attributes).

Using alignas cannot be considered "standard layout", thus all the implications are useless. Standard layout means one fixed alignment distribution (per architecture - usually all is align(min(sizeof,4)) or some may be align(8)). The standard probably wants to say what is obvious: without using special features (align,pack) structures are compatible on the same architecture if they appear to be the same (same types in same order). Otherwise, they may or may not be compatible - depending on architecture (may be compatible on one architecture but different on another).

Consider this struct:

struct foo{ char b; short h; double d; int i; };

On one architecture (e.g. x86 32bit) it is what it seems to be, but on Itanium or ARM it actually looks like this:

struct foo{char b, **_hidden_b**; short h; **int _maybe_hidden_h**; double d; int i;}  

Notice _maybe_hidden_h - it can be omitted in older AEABI (align to max 4) or there for 64bit/8B alignment.

x86 Standard Layout (pack(1)):

alignas(1) char b; alignas(1) short h; alignas(1) double d; alignas(1) int i;  

32bit Alignment Standard Layout (pack(4) - ARM architecture, older version - EABI)

alignas(1) char b; alignas(2) short h; **alignas(4) double d**; alignas(4) int i;  

64bit Alignment Standard Layout (pack(8) - Itanium and newer ARM/AEABI)

alignas(1) char b; alignas(2) short h; **alignas(8) double d**; alignas(4) int i;

To your example:
offsetof(A,y) = 4 while offsetof(B,y) = 2 and the union does not change that (thus &u.a.y != u.b.y)

like image 34
firda Avatar answered Sep 22 '22 14:09

firda