Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Would reinterpreting data be undefined behavior?

Someone recently brought it up that this:

uint8_t a = 0b10000000;
int8_t b = *(int8_t*) &a;

is undefined behavior, because the value of a is outside of what I can represent in int8_t. Can someone explain why exactly this is undefined behavior?

My main issue is that the memory is there, and is valid as the memory for int8_t, the only difference is that int8_t will interpret that byte as -128, while uint8_t will interpret it as 128. I am further confused by this because the fast inverse square root uses:

float y =  /* Some val*/;
int32_t i  = * ( int32_t * ) &y; 

This will give a value of i in essence unrelated (apart from the IEEE standard) to y, so I don't see why reinterpreting a piece of memory could be undefined behavior.

like image 778
Lala5th Avatar asked Aug 10 '21 18:08

Lala5th


2 Answers

Thanks for all the comments. I went down a rabbit hole of strict aliasing and found that the fast inverse square root is undefined behavior, despite my beliefs, but my initial code does not seem to be. Not because uint8_t is special, but as the standard has a rule for signed/unsigned interchange it:

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined: [...] (11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object

So there is no issue in theory, as uint8_t is the unsigned type of int8_t

like image 198
Lala5th Avatar answered Oct 23 '22 04:10

Lala5th


The problem is not the reinterpretation of data, but the reinterpretation of the pointer. This is problematic for due to the following, non-exhaustive list of reasons:

  • The standard does not require that all pointers be the same size, so sizeof(float*) does not have to be sizeof(int*), so the conversion may just lose data.
  • If you grab a uint32_t* from a float* and read from it, you would be reading a uint32_t that was never created.
  • As you said, compilers assume two pointers of different types (except unsigned char*) never alias, and perform optimizations with this information.

However, sometimes converting between bit representation of unrelated types is a legit requirement. Traditionally, this has been done using memcpy, but C++20 added std::bit_cast, able to do this reinterpretation even in constexpr, so the following is legal, and expresses our intention directly:

constexpr float pi = 3.14f;
constexpr uint32_t pi_bits = std::bit_cast<uint32_t>(pi);
like image 27
Fatih BAKIR Avatar answered Oct 23 '22 06:10

Fatih BAKIR