Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reinterpret_cast, char*, and undefined behavior

What are the cases where reinterpret_casting a char* (or char[N]) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?


As we learned from this question, the following is undefined behavior:

alignas(int) char data[sizeof(int)];
int *myInt = new (data) int;           // OK
*myInt = 34;                           // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder

But at what point can we do a reinterpret_cast on a char array and have it NOT be undefined behavior? Here are a few simple examples:

  1. No new, just reinterpret_cast:

    alignas(int) char data[sizeof(int)];
    *reinterpret_cast<int*>(data) = 42;    // is the first cast write UB?
    int i = *reinterpret_cast<int*>(data); // how about a read?
    *reinterpret_cast<int*>(data) = 4;     // how about the second write?
    int j = *reinterpret_cast<int*>(data); // or the second read?
    

    When does the lifetime for the int start? Is it with the declaration of data? If so, when does the lifetime of data end?

  2. What if data were a pointer?

    char* data_ptr = new char[sizeof(int)];
    *reinterpret_cast<int*>(data_ptr) = 4;     // is this UB?
    int i = *reinterpret_cast<int*>(data_ptr); // how about the read?
    
  3. What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?

    // bunch of handle functions that do stuff with the members of these types
    void handle(MsgType1 const& );
    void handle(MsgTypeF const& );
    
    char buffer[100]; 
    ::recv(some_socket, buffer, 100)
    
    switch (buffer[0]) {
    case '1':
        handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
        break;
    case 'F':
        handle(*reinterpret_cast<MsgTypeF*>(buffer));
        break;
    // ...
    }
    

Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?

like image 391
Barry Avatar asked Sep 10 '16 18:09

Barry


People also ask

What does reinterpret_cast mean?

reinterpret_cast is a type of casting operator used in C++. It is used to convert a pointer of some data type into a pointer of another data type, even if the data types before and after conversion are different. It does not check if the pointer type and data pointed by the pointer is same or not.

Can reinterpret_cast throw exception?

No. It is a purely compile-time construct. It is very dangerous, because it lets you get away with very wrong conversions.

Is reinterpret_cast safe?

The result of a reinterpret_cast cannot safely be used for anything other than being cast back to its original type. Other uses are, at best, nonportable. The reinterpret_cast operator cannot cast away the const , volatile , or __unaligned attributes.

Is reinterpret cast compile-time?

The dynamic cast is the only that needs to be "calculated" in run-time. All other casts are calculated in compile-time. The machine code for a static_cast is a fixed function based on the type you are casting FROM and TO. For reinterpret_cast , the machine code can be resolved in compile-time as well.


1 Answers

There are two rules at play here:

  1. [basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.

  2. [base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.

These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".

All of your code examples fall prey to the same problem: they're not the object you cast them to:

alignas(int) char data[sizeof(int)];

That creates an object of type char[sizeof(int)]. That object is not an int. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.

Similarly:

char* data_ptr = new char[sizeof(int)];

That also creates an object of type char[sizeof(int)].

char buffer[100];

This creates an object of type char[100]. That object is neither a MsgType1 nor a MsgTypeF. So you cannot access it as if it were either.

Note that the UB here is when you access the buffer as one of the Msg* types, not when you check the first byte. If all your Msg* types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.

switch (buffer[0]) {
case '1':
    {
        MsgType1 msg;
        memcpy(&msg, buffer, sizeof(MsgType1);
        handle(msg);
    }
    break;
case 'F':
    {
        MsgTypeF msg;
        memcpy(&msg, buffer, sizeof(MsgTypeF);
        handle(msg);
    }
    break;
// ...
}

Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.

Does the answer to this question change between C++11 to C++1z?

There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.

like image 96
Nicol Bolas Avatar answered Sep 18 '22 12:09

Nicol Bolas