Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strict aliasing and references to compile-time C arrays

Given the following code

#include <cassert>
#include <climits>
#include <cstdint>
#include <iostream>

static_assert(CHAR_BIT == 8, "A byte does not consist of 8 bits");

void func1(const int32_t& i)
{
    const unsigned char* j = reinterpret_cast<const unsigned char*>(&i);
    for(int k = 0; k < 4; ++k)
        std::cout << static_cast<int>(j[k]) << ' ';
    std::cout << '\n';
}

void func2(const int32_t& i)
{
    const unsigned char (&j)[4] = reinterpret_cast<const unsigned char (&)[4]>(i);
    for(int k = 0; k < 4; ++k)
        std::cout << static_cast<int>(j[k]) << ' ';
    std::cout << '\n';
}

int main() {
    func1(-1);
    func2(-1);
}

From the language rules it is clear that func1 is fine, as pointers to unsigned char can alias any other type. My question is: does this extend to C++ references to C-arrays with known length? Intuitively I would say yes. Is func2 well-defined or does it trigger undefined behavior?

I have tried compiling the above code using Clang and GCC with every possible combination of -Wextra -Wall -Wpedantic and UBSAN, and have gotten no warnings and always the same output. That obviously doesn't state that there's no UB, but I couldn't trigger any of the usual strict-aliasing type optimization bugs.

like image 257
Jonas Müller Avatar asked Oct 16 '19 12:10

Jonas Müller


1 Answers

It's undefined behavior.

On the meaning of reinterpret_cast here we have [expr.reinterpret.cast]

11 A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result refers to the same object as the source glvalue, but with the specified type. [ Note: That is, for lvalues, a reference cast reinterpret_­cast(x) has the same effect as the conversion *reinterpret_­cast(&x) with the built-in & and * operators (and similarly for reinterpret_­cast(x)).  — end note ] No temporary is created, no copy is made, and constructors or conversion functions are not called.

This tells us that the cast int func2 is valid so long as reinterpret_cast<const unsigned char (*)[4]>(&i) is valid. No shock here. But the crux of the matter is that you may not get anything meaningful out of that pointer conversion. On that subject we have this over at [basic.compound]:

4 Two objects a and b are pointer-interconvertible if:

  • they are the same object, or
  • one is a standard-layout union object and the other is a non-static data member of that object ([class.union]), or
  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object ([class.mem]), or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast. [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address.  — end note ]

That's an exhaustive list of meaningful pointer conversions. So we are not permitted to obtain an array address like that, and as such it is not a valid array glvalue. Therefore the further use you make of the result of the cast is undefined.

like image 191
StoryTeller - Unslander Monica Avatar answered Oct 22 '22 07:10

StoryTeller - Unslander Monica