Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unions, aliasing and type-punning in practice: what works and what does not?

I have a problem understanding what can and cannot be done using unions with GCC. I read the questions (in particular here and here) about it but they focus the C++ standard, I feel there's a mismatch between the C++ standard and the practice (the commonly used compilers).

In particular, I recently found confusing informations in the GCC online doc while reading about the compilation flag -fstrict-aliasing. It says:

-fstrict-aliasing

Allow the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type. Pay special attention to code like this:

union a_union {
  int i;
  double d;
};

int f() {
  union a_union t;
  t.d = 3.0;
  return t.i;
}

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.

This is what I think I understood from this example and my doubts:

1) aliasing only works between similar types, or char

Consequence of 1): aliasing - as the word suggests - is when you have one value and two members to access it (i.e. the same bytes);

Doubt: are two types similar when they have the same size in bytes? If not, what are similar types?

Consequence of 1) for non similar types (whatever this means), aliasing does not work;

2) type punning is when we read a different member than the one we wrote to; it's common and it works as expected as long as the memory is accessed through the union type;

Doubt: is aliasing a specific case of type-punning where types are similar?

I get confused because it says unsigned int and double are not similar, so aliasing does not work; then in the example it's aliasing between int and double and it clearly says it works as expected, but calls it type-punning: not because types are or are not similar, but because it's reading from a member it did not write. But reading from a member it did not write is what I understood aliasing is for (as the word suggests). I'm lost.

The questions: can someone clarify the difference between aliasing and type-punning and what uses of the two techniques are working as expected in GCC? And what does the compiler flag do?

like image 919
L.C. Avatar asked Feb 19 '19 08:02

L.C.


2 Answers

Aliasing can be taken literally for what it means: it is when two different expressions refer to the same object. Type-punning is to "pun" a type, ie to use a object of some type as a different type.

Formally, type-punning is undefined behaviour with only a few exceptions. It happens commonly when you fiddle with bits carelessly

int mantissa(float f)
{
    return (int&)f & 0x7FFFFF;    // Accessing a float as if it's an int
}

The exceptions are (simplified)

  • Accessing integers as their unsigned/signed counterparts
  • Accessing anything as a char, unsigned char or std::byte

This is known as the strict-aliasing rule: the compiler can safely assume two expressions of different types never refer to the same object (except for the exceptions above) because they would otherwise have undefined behaviour. This facilitates optimizations such as

void transform(float* dst, const int* src, int n)
{
    for(int i = 0; i < n; i++)
        dst[i] = src[i];    // Can be unrolled and use vector instructions
                            // If dst and src alias the results would be wrong
}

What gcc says is it relaxes the rules a bit, and allows type-punning through unions even though the standard doesn't require it to

union {
    int64_t num;
    struct {
        int32_t hi, lo;
    } parts;
} u = {42};
u.parts.hi = 420;

This is the type-pun gcc guarantees will work. Other cases may appear to work but may one day silently be broken.

like image 109
Passer By Avatar answered Sep 17 '22 13:09

Passer By


Terminology is a great thing, I can use it however I want, and so can everyone else!

are two types similar when they have the same size in bytes? If not, what are similar types?

Roughly speaking, types are similar when they differ by constness or signedness. Size in bytes alone is definitely not sufficient.

is aliasing a specific case of type-punning where types are similar?

Type punning is any technique that circumvents the type system.

Aliasing is a specific case of that which involves placing objects of different types at the same address. Aliasing is generally allowed when types are similar, and forbidden otherwise. In addition, one may access an object of any type through a char (or similar to char) lvalue, but doing the opposite (i.e. accessing an object of type char through a dissimilar type lvalue) is not allowed. This is guaranteed by both C and C++ standards, GCC simply implements what the standards mandate.

GCC documentation seems to use "type punning" in a narrow sense of reading a union member other than the one last written to. This kind of type punning is allowed by the C standard even when types are not similar. OTOH the C++ standard does not allow this. GCC may or may not extend the permission to C++, the documentation is not clear on this.

Without -fstrict-aliasing, GCC apparently relaxes these requirements, but it isn't clear to what exact extent. Note that -fstrict-aliasing is the default when performing an optimised build.

Bottom line, just program to the standard. If GCC relaxes the requirements of the standard, it isn't significant and isn't worth the trouble.

like image 34
n. 1.8e9-where's-my-share m. Avatar answered Sep 21 '22 13:09

n. 1.8e9-where's-my-share m.