What are the cases where reinterpret_cast
ing a char*
(or char[N]
) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?
As we learned from this question, the following is undefined behavior:
alignas(int) char data[sizeof(int)];
int *myInt = new (data) int; // OK
*myInt = 34; // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder
But at what point can we do a reinterpret_cast
on a char
array and have it NOT be undefined behavior? Here are a few simple examples:
No new
, just reinterpret_cast
:
alignas(int) char data[sizeof(int)];
*reinterpret_cast<int*>(data) = 42; // is the first cast write UB?
int i = *reinterpret_cast<int*>(data); // how about a read?
*reinterpret_cast<int*>(data) = 4; // how about the second write?
int j = *reinterpret_cast<int*>(data); // or the second read?
When does the lifetime for the int
start? Is it with the declaration of data
? If so, when does the lifetime of data
end?
What if data
were a pointer?
char* data_ptr = new char[sizeof(int)];
*reinterpret_cast<int*>(data_ptr) = 4; // is this UB?
int i = *reinterpret_cast<int*>(data_ptr); // how about the read?
What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?
// bunch of handle functions that do stuff with the members of these types
void handle(MsgType1 const& );
void handle(MsgTypeF const& );
char buffer[100];
::recv(some_socket, buffer, 100)
switch (buffer[0]) {
case '1':
handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
break;
case 'F':
handle(*reinterpret_cast<MsgTypeF*>(buffer));
break;
// ...
}
Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?
reinterpret_cast is a type of casting operator used in C++. It is used to convert a pointer of some data type into a pointer of another data type, even if the data types before and after conversion are different. It does not check if the pointer type and data pointed by the pointer is same or not.
No. It is a purely compile-time construct. It is very dangerous, because it lets you get away with very wrong conversions.
The result of a reinterpret_cast cannot safely be used for anything other than being cast back to its original type. Other uses are, at best, nonportable. The reinterpret_cast operator cannot cast away the const , volatile , or __unaligned attributes.
The dynamic cast is the only that needs to be "calculated" in run-time. All other casts are calculated in compile-time. The machine code for a static_cast is a fixed function based on the type you are casting FROM and TO. For reinterpret_cast , the machine code can be resolved in compile-time as well.
There are two rules at play here:
[basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.
[base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.
These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".
All of your code examples fall prey to the same problem: they're not the object you cast them to:
alignas(int) char data[sizeof(int)];
That creates an object of type char[sizeof(int)]
. That object is not an int
. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.
Similarly:
char* data_ptr = new char[sizeof(int)];
That also creates an object of type char[sizeof(int)]
.
char buffer[100];
This creates an object of type char[100]
. That object is neither a MsgType1
nor a MsgTypeF
. So you cannot access it as if it were either.
Note that the UB here is when you access the buffer as one of the Msg*
types, not when you check the first byte. If all your Msg*
types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.
switch (buffer[0]) {
case '1':
{
MsgType1 msg;
memcpy(&msg, buffer, sizeof(MsgType1);
handle(msg);
}
break;
case 'F':
{
MsgTypeF msg;
memcpy(&msg, buffer, sizeof(MsgTypeF);
handle(msg);
}
break;
// ...
}
Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.
Does the answer to this question change between C++11 to C++1z?
There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With