Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a reason why Clang does not optimize this code?

Consider this function which I found in this question:

void to_bytes(uint64_t const& x, uint8_t* dest) {
    dest[7] = uint8_t(x >> 8*7);
    dest[6] = uint8_t(x >> 8*6);
    dest[5] = uint8_t(x >> 8*5);
    dest[4] = uint8_t(x >> 8*4);
    dest[3] = uint8_t(x >> 8*3);
    dest[2] = uint8_t(x >> 8*2);
    dest[1] = uint8_t(x >> 8*1);
    dest[0] = uint8_t(x >> 8*0);
}

As x and dest might point to the same memory, the compiler is not allowed to optimize this into a single qword move (each line might change the value of x).

So far so good.

But if you pass x by value instead, this argument does not longer hold. And indeed, GCC optimizes this to a simple mov instruction, as expected: https://godbolt.org/z/iYj1or

However, clang does not: https://godbolt.org/z/Hgg5z9

I'm assuming that, as it is not even guaranteed that x occupies any stack memory at all, any attempt to make dest point to x before the function is called would result in undefined behavior and thus the compiler can assume that this just never happens. That would mean clang is missing some opportunity here. But I'm not sure. Can somebody clarify?

like image 842
sebrockm Avatar asked May 13 '19 11:05

sebrockm


People also ask

How do I disable clang optimization?

Inlining. Inline suitable functions. The -fno-inline-functions will disable this optimization.

Is clang fast?

In addition to being fast and functional, we aim to make Clang extremely user friendly. As far as a command-line compiler goes, this basically boils down to making the diagnostics (error and warning messages) generated by the compiler be as useful as possible.


1 Answers

The code you've given is way overcomplicated. You can replace it with:

void to_bytes(uint64_t x, uint8_t* dest) {
    x = htole64(x);
    std::memcpy(dest, &x, sizeof(x));
}

Yes, this uses the Linux-ism htole64(), but if you're on another platform you can easily reimplement that.

Clang and GCC optimize this perfectly, on both little- and big-endian platforms.

like image 157
John Zwinck Avatar answered Oct 12 '22 23:10

John Zwinck