Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the following C union access pattern undefined behavior?

The following is not undefined behavior in modern C:

union foo
{
    int i;
    float f;
};
union foo bar;
bar.f = 1.0f;
printf("%08x\n", bar.i);

and prints the hex representation of 1.0f.

However the following is undefined behavior:

int x;
printf("%08x\n", x);

What about this?

union xyzzy
{
    char c;
    int i;
};
union xyzzy plugh;

This ought to be undefined behavior since no member of plugh has been written.

printf("%08x\n", plugh.i);

But what about this. Is this undefined behavior or not?

plugh.c = 'A';
printf("%08x\n", plugh.i);

Most C compilers nowadays will have sizeof(char) < sizeof(int), with sizeof(int) being either 2 or 4. That means that in these cases, at most 50% or 25% of plugh.i will have been written to, but reading the remaining bytes will be reading uninitialized data, and hence should be undefined behavior. On the basis of this, is the entire read undefined behavior?

like image 964
dgnuff Avatar asked Sep 12 '18 08:09

dgnuff


People also ask

What type of behavior C is undefined?

So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

What causes undefined Behaviour in C?

In C the use of any automatic variable before it has been initialized yields undefined behavior, as does integer division by zero, signed integer overflow, indexing an array outside of its defined bounds (see buffer overflow), or null pointer dereferencing.

What is the union type in c++?

In C++17 and later, the std::variant class is a type-safe alternative for a union. A union is a user-defined type in which all members share the same memory location. This definition means that at any given time, a union can contain no more than one object from its list of members.

What is the point of union in C?

Union in C is a special data type available in C that allows storing different data types in the same memory location. You can define a union with many members, but only one member can contain a value at any given time. Unions provide an efficient way of using the same memory location for multiple purposes.


2 Answers

Defect report 283: Accessing a non-current union member ("type punning") covers this and tells us there is undefined behavior if there is trap representation.

The defect report asked:

In the paragraph corresponding to 6.5.2.3#5, C89 contained this sentence:

With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

Associated with that sentence was this footnote:

The "byte orders" for scalar types are invisible to isolated programs that do not indulge in type punning (for example, by assigning to one member of a union and inspecting the storage by accessing another member that is an appropriately sixed array of character type), but must be accounted for when conforming to externally imposed storage layouts.

The only corresponding verbiage in C99 is 6.2.6.1#7:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation.

It is not perfectly clear that the C99 words have the same implications as the C89 words.

The defect report added the following footnote:

Attach a new footnote 78a to the words "named member" in 6.5.2.3#3:

78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

C11 6.2.6.1 General tells us:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50)Such a representation is called a trap representation.

like image 111
Shafik Yaghmour Avatar answered Oct 25 '22 17:10

Shafik Yaghmour


From 6.2.6.1 §7 :

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

So, the value of plugh.i would be unspecified after setting plugh.c.

From a footnote to 6.5.2.3 §3 :

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

This says that type punning is specifically allowed (as you asserted in your question). But it might result in a trap representation, in which case reading the value has undefined behavior according to 6.2.6.1 §5 :

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 50) Such a representation is called a trap representation.

If it's not a trap representation, there seems to be nothing in the standard that would make this undefined behavior, because from 4 §3, we get :

A program that is correct in all other aspects, operating on correct data, containing unspecified behavior shall be a correct program and act in accordance with 5.1.2.3.

like image 44
Sander De Dycker Avatar answered Oct 25 '22 18:10

Sander De Dycker