If you want to convert uint64_t
to a uint8_t[8]
(little endian). On a little endian architecture you can just do an ugly reinterpret_cast<>
or memcpy()
, e.g:
void from_memcpy(const std::uint64_t &x, uint8_t* bytes) {
std::memcpy(bytes, &x, sizeof(x));
}
This generates efficient assembly:
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
However it is not portable. It will have different behaviour on a little endian machine.
For converting uint8_t[8]
to uint64_t
there is a great solution - just do this:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[0]) << 8*0) |
(std::uint64_t(bytes[1]) << 8*1) |
(std::uint64_t(bytes[2]) << 8*2) |
(std::uint64_t(bytes[3]) << 8*3) |
(std::uint64_t(bytes[4]) << 8*4) |
(std::uint64_t(bytes[5]) << 8*5) |
(std::uint64_t(bytes[6]) << 8*6) |
(std::uint64_t(bytes[7]) << 8*7);
}
This looks inefficient but actually with Clang -O2
it generates exactly the same assembly as before, and if you compile on a big endian machine it will be smart enough to use a native byte swap instruction. E.g. this code:
void to(const std::uint8_t* bytes, std::uint64_t &x) {
x = (std::uint64_t(bytes[7]) << 8*0) |
(std::uint64_t(bytes[6]) << 8*1) |
(std::uint64_t(bytes[5]) << 8*2) |
(std::uint64_t(bytes[4]) << 8*3) |
(std::uint64_t(bytes[3]) << 8*4) |
(std::uint64_t(bytes[2]) << 8*5) |
(std::uint64_t(bytes[1]) << 8*6) |
(std::uint64_t(bytes[0]) << 8*7);
}
Compiles to:
mov rax, qword ptr [rdi]
bswap rax
mov qword ptr [rsi], rax
ret
My question is: is there an equivalent reliably-optimised construct for converting in the opposite direction? I've tried this, but it gets compiled naively:
void from(const std::uint64_t &x, uint8_t* bytes) {
bytes[0] = x >> 8*0;
bytes[1] = x >> 8*1;
bytes[2] = x >> 8*2;
bytes[3] = x >> 8*3;
bytes[4] = x >> 8*4;
bytes[5] = x >> 8*5;
bytes[6] = x >> 8*6;
bytes[7] = x >> 8*7;
}
Edit: After some experimentation, this code does get compiled optimally with GCC 8.1 and later as long as you use uint8_t* __restrict__ bytes
. However I still haven't managed to find a form that Clang will optimise.
Here's what I could test based on the discussion in OP's comments:
void from_optimized(const std::uint64_t &x, std::uint8_t* bytes) {
std::uint64_t big;
std::uint8_t* temp = (std::uint8_t*)&big;
temp[0] = x >> 8*0;
temp[1] = x >> 8*1;
temp[2] = x >> 8*2;
temp[3] = x >> 8*3;
temp[4] = x >> 8*4;
temp[5] = x >> 8*5;
temp[6] = x >> 8*6;
temp[7] = x >> 8*7;
std::uint64_t* dest = (std::uint64_t*)bytes;
*dest = big;
}
Looks like this will make things clearer for the compiler and let it assume the necessary parameters to optimize it (both on GCC and Clang with -O2
).
Compiling to x86-64
(little endian) on Clang 8.0.0 (test on Godbolt):
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
ret
Compiling to aarch64_be
(big endian) on Clang 8.0.0 (test on Godbolt):
ldr x8, [x0]
rev x8, x8
str x8, [x1]
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With