Suppose we get some data as a sequence of bytes, and want to reinterpret that sequence as a structure (having some guarantees that the data is indeed in the correct format). For example:
#include <fstream>
#include <vector>
#include <cstdint>
#include <cstdlib>
#include <iostream>
struct Data
{
std::int32_t someDword[629835];
std::uint16_t someWord[9845];
std::int8_t someSignedByte;
};
Data* magic_reinterpret(void* raw)
{
return reinterpret_cast<Data*>(raw); // BAD! Breaks strict aliasing rules!
}
std::vector<char> getDataBytes()
{
std::ifstream file("file.bin",std::ios_base::binary);
if(!file) std::abort();
std::vector<char> rawData(sizeof(Data));
file.read(rawData.data(),sizeof(Data));
if(!file) std::abort();
return rawData;
}
int main()
{
auto rawData=getDataBytes();
Data* data=magic_reinterpret(rawData.data());
std::cout << "someWord[346]=" << data->someWord[346] << "\n";
data->someDword[390875]=23235;
std::cout << "someDword=" << data->someDword << "\n";
}
Now the magic_reinterpret
here is actually bad, since it breaks strict aliasing rules and thus causes UB.
How should it instead be implemented to not cause the UB and not do any copies of data like with memcpy
?
EDIT: the getDataBytes()
function above was in fact considered some unchangeable function. A real-world example is ptrace(2)
, which on Linux, when request==PTRACE_GETREGSET
and addr==NT_PRSTATUS
, writes (on x86-64) one of two possible structures of different sizes, depending on tracee bitness, and returns the size. Here ptrace
calling code can't predict what type of structure it will get until it actually does the call. How could it then safely reinterpret the results it gets as the correct pointer type?
By not reading the file as a stream of bytes, but as a stream of Data
structures.
Simply do e.g.
Data data;
file.read(reinterpret_cast<char*>(&data), sizeof(data));
I think these is a special exception to the strict aliasing rules for all the char
types (signed, unsigned, and plain). So I think all you have to do, is change the signature of magic_reinterpret
to:
Data* magic_reinterpret(char *raw)
Doesn't work I'm afraid. As commented by deviantfan, you can read (or write) a Data
as a series of [unsigned] char
, but you can't read or write char
as a Data
. The answer by Joachim is correct.
Having said all that. If you are reading from a network or file, the extra overhead of reading your input as a series of octets and calculating the fields from a buffer is going to be negligible (and will allow you to cope with changes in layout between compiler versions, and machines).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With