Consider this function which I found in this question:
void to_bytes(uint64_t const& x, uint8_t* dest) {
dest[7] = uint8_t(x >> 8*7);
dest[6] = uint8_t(x >> 8*6);
dest[5] = uint8_t(x >> 8*5);
dest[4] = uint8_t(x >> 8*4);
dest[3] = uint8_t(x >> 8*3);
dest[2] = uint8_t(x >> 8*2);
dest[1] = uint8_t(x >> 8*1);
dest[0] = uint8_t(x >> 8*0);
}
As x
and dest
might point to the same memory, the compiler is not allowed to optimize this into a single qword move (each line might change the value of x
).
So far so good.
But if you pass x
by value instead, this argument does not longer hold.
And indeed, GCC optimizes this to a simple mov
instruction, as expected: https://godbolt.org/z/iYj1or
However, clang does not: https://godbolt.org/z/Hgg5z9
I'm assuming that, as it is not even guaranteed that x
occupies any stack memory at all, any attempt to make dest
point to x
before the function is called would result in undefined behavior and thus the compiler can assume that this just never happens. That would mean clang is missing some opportunity here. But I'm not sure. Can somebody clarify?
Inlining. Inline suitable functions. The -fno-inline-functions will disable this optimization.
In addition to being fast and functional, we aim to make Clang extremely user friendly. As far as a command-line compiler goes, this basically boils down to making the diagnostics (error and warning messages) generated by the compiler be as useful as possible.
The code you've given is way overcomplicated. You can replace it with:
void to_bytes(uint64_t x, uint8_t* dest) {
x = htole64(x);
std::memcpy(dest, &x, sizeof(x));
}
Yes, this uses the Linux-ism htole64()
, but if you're on another platform you can easily reimplement that.
Clang and GCC optimize this perfectly, on both little- and big-endian platforms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With