Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Packing and pointer aliasing, C and C++

Tags:

c++

c

union vec
{
#pragma pack(push,1)
   struct
   {
      float x, y, z;
   }
#pragma pack(pop)
   float vals[3];
};

Consider the above definition. (Anonymous unions in C99 aside)

I suppose this answer possibly permits different answers depending on choice of compiler, choice of language, and choice of standard.

  1. I believe I am guaranteed (via #pragma compiler documentation, not language guarantee) that sizeof(vec) == 3*sizeof(float)
  2. As such, I believe I am guaranteed that &vec.x == &vec.vals[0], ect.
  3. However, I am unsure if it is legal (that is, not allowed via strict aliasing), to write from v.x and then read from v.vals[0]

Packing aside, I believe the relevant verbiage (from the C99 standard, at least) is:

  • a type compatible with the effective type of the object,

  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

like image 874
zac Avatar asked Sep 20 '17 13:09

zac


People also ask

What is pointer aliasing in C?

Pointer aliasing is a hidden kind of data dependency that can occur in C, C++, or any other language that uses pointers for array addresses in arithmetic operations. Array data identified by pointers in C can overlap, because the C language puts very few restrictions on pointers.

Does C support aliasing?

In C, C++, and some other programming languages, the term aliasing refers to a situation where two different expressions or symbols refer to the same object.

What is strict aliasing in C?

GCC compiler makes an assumption that pointers of different types will never point to the same memory location i.e., alias of each other. Strict aliasing rule helps the compiler to optimize the code.

What is aliasing in programming languages?

An alias occurs when different variables point directly or indirectly to a single area of storage. Aliasing refers to assumptions made during optimization about which variables can point to or occupy the same storage area.


1 Answers

  1. I believe I am guaranteed (via #pragma compiler documentation, not language guarantee) that sizeof(vec) == 3*sizeof(float)

Yes that's correct, assuming the #pragma disabled padding entirely.


  1. As such, I believe I am guaranteed that &vec.x == &vec.vals[0], ect.

This is guaranteed regardless of padding, because there can never be padding at the beginning of the struct/union. See for example C11 6.7.2.1 §15:

There may be unnamed padding within a structure object, but not at its beginning.

This holds true for all versions of the C standard, and as far as I know, also for all versions of the C++ standard.


  1. However, I am unsure if it is legal (that is, not allowed via strict aliasing), to write from v.x and then read from v.vals[0]

This is fine in C but undefined behavior in C++.

In C, the ./-> operator guarantees this, C11 6.5.2.3:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.

Where footnote 95 (informative, not normative) says:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

C++ have no guarantees like this, so "type punning" through unions is undefined behavior in C++. This is a major difference between the two languages.

Furthermore, C has the concept of common initial sequence for unions, also specified in C11 6.5.2.3:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.


It is true that the array and the struct in your example may alias, because of the part you cited "an aggregate or union type that includes one of the aforementioned types among its members". So writing to the struct and then reading that data through the array does not violate strict aliasing, neither in C nor C++.

However, C++ has the concept of "active member" when dealing with unions, so in C++ this would give poorly-specified behavior for other reasons than aliasing - namely that C++ only guaranteed that the last written member of the union can be safely read.

like image 120
Lundin Avatar answered Nov 03 '22 02:11

Lundin