I'm writing a websocket server and I have to deal with masked data that I need to unmask.
The mask is unsigned char[4], and the data is a unsigned char* buffer as well.
I don't want to XOR byte by byte, I'd much rather XOR 4-bytes at a time.
uint32_t * const end = reinterpret_cast<uint32_t *>(data_+length);
for(uint32_t *i = reinterpret_cast<uint32_t *>(data_); i != end; ++i) {
*i ^= mask_;
}
Is there anything wrong with the use of reinterpret_cast in this situation?
The alternative would be the following code which isn't as clear and not as fast:
uint64_t j = 0;
uint8_t *end = data_+length;
for(uint8_t *i = data_; i != end; ++i,++j) {
*i ^= mask_[j % 4];
}
I'm all ears for alternatives, including ones dependent on c++11 features.
The are a few potential problems with the approach posted:
char
needs to be aligned properly to be accessible. A typical requirement for uint32_t
is that the object is aligned to an address divisible by four.length / sizeof(uint32_t) != 0
the loop may never terminate.mask
needs to contain different values. If mask
is produced by *reinterpret_cast<uint32_t>(char_mask)
of a suitable array this shouldn't be an array.If these issues are taken care of, reinterpret_cast<...>(...)
can be used in the situation you have. Reinterpreting the meaning of pointers is one of the reasons this operation is there and sometimes it is needed. I would create a suitable test case to verify that it works properly, though, to avoid having to hunt down problems when porting the code to a different platform.
Personally I would go with a different approach until profiling shows that it is too slow:
char* it(data);
if (4 < length) {
for (char* end(data + length - 4); it < end; it += 4) {
it[0] ^= mask_[0];
it[1] ^= mask_[1];
it[2] ^= mask_[2];
it[3] ^= mask_[3];
}
}
it != data + length && *it++ ^= mask_[0];
it != data + length && *it++ ^= mask_[1];
it != data + length && *it++ ^= mask_[2];
it != data + length && *it++ ^= mask_[3];
I'm definitely using a number of similar approaches in software which meant to be really faster and haven't found them to be a notable performance problem.
There's nothing specifically wrong with reinterpret_cast
in this case. But, take care.
The 32-bit loop as it stands is incorrect, because it doesn't cater for the case where the payload isn't a multiple of 32 bits in size. Two possible solutions, I suppose:
!=
with <
in the for loop check (there's a reason why people use <
, and it's not because they're dumb...) and do the trailing 1-3 bytes bytewiseAdditionally, depending on how the code is structured you might also have to cope with misaligned data accesses for some CPUs. If you have an entire frame buffered, header and all, in a buffer that's 32-bit aligned, and if the payload length is <126 bytes or >65,535 bytes, then both the masking key and the payload will be misaligned.
For whatever it's worth, my server uses something like the first loop:
for(int i=0;i<n;++i)
payload[i]^=key[i&3];
Unlike the 32-bit option, this is basically impossible to get wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With