Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is reinterpret_cast bad when dealing with low-level byte manipulation?

I'm writing a websocket server and I have to deal with masked data that I need to unmask.

The mask is unsigned char[4], and the data is a unsigned char* buffer as well.

I don't want to XOR byte by byte, I'd much rather XOR 4-bytes at a time.

uint32_t * const end = reinterpret_cast<uint32_t *>(data_+length);
for(uint32_t *i = reinterpret_cast<uint32_t *>(data_); i != end; ++i) {
    *i ^= mask_;
}

Is there anything wrong with the use of reinterpret_cast in this situation?

The alternative would be the following code which isn't as clear and not as fast:

uint64_t j = 0;
uint8_t *end = data_+length;
for(uint8_t *i = data_; i != end; ++i,++j) {
    *i ^= mask_[j % 4];
}

I'm all ears for alternatives, including ones dependent on c++11 features.

like image 550
matthewaveryusa Avatar asked Dec 30 '12 23:12

matthewaveryusa


2 Answers

The are a few potential problems with the approach posted:

  1. On some systems objects of a type bigger than char needs to be aligned properly to be accessible. A typical requirement for uint32_t is that the object is aligned to an address divisible by four.
  2. If length / sizeof(uint32_t) != 0 the loop may never terminate.
  3. Depending on the endianess of the system mask needs to contain different values. If mask is produced by *reinterpret_cast<uint32_t>(char_mask) of a suitable array this shouldn't be an array.

If these issues are taken care of, reinterpret_cast<...>(...) can be used in the situation you have. Reinterpreting the meaning of pointers is one of the reasons this operation is there and sometimes it is needed. I would create a suitable test case to verify that it works properly, though, to avoid having to hunt down problems when porting the code to a different platform.

Personally I would go with a different approach until profiling shows that it is too slow:

char* it(data);
if (4 < length) {
    for (char* end(data + length - 4); it < end; it += 4) {
        it[0] ^= mask_[0];
        it[1] ^= mask_[1];
        it[2] ^= mask_[2];
        it[3] ^= mask_[3];
    }
}
it != data + length && *it++ ^= mask_[0];
it != data + length && *it++ ^= mask_[1];
it != data + length && *it++ ^= mask_[2];
it != data + length && *it++ ^= mask_[3];

I'm definitely using a number of similar approaches in software which meant to be really faster and haven't found them to be a notable performance problem.

like image 184
Dietmar Kühl Avatar answered Oct 19 '22 14:10

Dietmar Kühl


There's nothing specifically wrong with reinterpret_cast in this case. But, take care.

The 32-bit loop as it stands is incorrect, because it doesn't cater for the case where the payload isn't a multiple of 32 bits in size. Two possible solutions, I suppose:

  • replace the != with < in the for loop check (there's a reason why people use <, and it's not because they're dumb...) and do the trailing 1-3 bytes bytewise
  • arrange the buffer so that the size of the buffer for the payload part is a multiple of 32 bits, and just XOR the extra bytes. (Presumably the code checks the payload length when returning bytes to the caller, so this doesn't matter.)

Additionally, depending on how the code is structured you might also have to cope with misaligned data accesses for some CPUs. If you have an entire frame buffered, header and all, in a buffer that's 32-bit aligned, and if the payload length is <126 bytes or >65,535 bytes, then both the masking key and the payload will be misaligned.

For whatever it's worth, my server uses something like the first loop:

for(int i=0;i<n;++i)
    payload[i]^=key[i&3];

Unlike the 32-bit option, this is basically impossible to get wrong.

like image 41
Tom Seddon Avatar answered Oct 19 '22 16:10

Tom Seddon