Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reinterpret a sequence of bytes as a POD structure without causing UB?

Suppose we get some data as a sequence of bytes, and want to reinterpret that sequence as a structure (having some guarantees that the data is indeed in the correct format). For example:

#include <fstream>
#include <vector>
#include <cstdint>
#include <cstdlib>
#include <iostream>

struct Data
{
    std::int32_t someDword[629835];
    std::uint16_t someWord[9845];
    std::int8_t someSignedByte;
};

Data* magic_reinterpret(void* raw)
{
    return reinterpret_cast<Data*>(raw); // BAD! Breaks strict aliasing rules!
}

std::vector<char> getDataBytes()
{
    std::ifstream file("file.bin",std::ios_base::binary);
    if(!file) std::abort();
    std::vector<char> rawData(sizeof(Data));
    file.read(rawData.data(),sizeof(Data));
    if(!file) std::abort();
    return rawData;
}

int main()
{
    auto rawData=getDataBytes();
    Data* data=magic_reinterpret(rawData.data());
    std::cout << "someWord[346]=" << data->someWord[346] << "\n";
    data->someDword[390875]=23235;
    std::cout << "someDword=" << data->someDword << "\n";
}

Now the magic_reinterpret here is actually bad, since it breaks strict aliasing rules and thus causes UB.

How should it instead be implemented to not cause the UB and not do any copies of data like with memcpy?


EDIT: the getDataBytes() function above was in fact considered some unchangeable function. A real-world example is ptrace(2), which on Linux, when request==PTRACE_GETREGSET and addr==NT_PRSTATUS, writes (on x86-64) one of two possible structures of different sizes, depending on tracee bitness, and returns the size. Here ptrace calling code can't predict what type of structure it will get until it actually does the call. How could it then safely reinterpret the results it gets as the correct pointer type?

like image 766
Ruslan Avatar asked Dec 11 '15 12:12

Ruslan


2 Answers

By not reading the file as a stream of bytes, but as a stream of Data structures.

Simply do e.g.

Data data;
file.read(reinterpret_cast<char*>(&data), sizeof(data));
like image 141
Some programmer dude Avatar answered Nov 07 '22 17:11

Some programmer dude


I think these is a special exception to the strict aliasing rules for all the char types (signed, unsigned, and plain). So I think all you have to do, is change the signature of magic_reinterpret to:

Data* magic_reinterpret(char *raw)

Doesn't work I'm afraid. As commented by deviantfan, you can read (or write) a Data as a series of [unsigned] char, but you can't read or write char as a Data. The answer by Joachim is correct.

Having said all that. If you are reading from a network or file, the extra overhead of reading your input as a series of octets and calculating the fields from a buffer is going to be negligible (and will allow you to cope with changes in layout between compiler versions, and machines).

like image 1
Martin Bonner supports Monica Avatar answered Nov 07 '22 15:11

Martin Bonner supports Monica