Is there a reason why Clang does not optimize this code?

Tags:

Consider this function which I found in this question:

void to_bytes(uint64_t const& x, uint8_t* dest) {
    dest[7] = uint8_t(x >> 8*7);
    dest[6] = uint8_t(x >> 8*6);
    dest[5] = uint8_t(x >> 8*5);
    dest[4] = uint8_t(x >> 8*4);
    dest[3] = uint8_t(x >> 8*3);
    dest[2] = uint8_t(x >> 8*2);
    dest[1] = uint8_t(x >> 8*1);
    dest[0] = uint8_t(x >> 8*0);
}

As x and dest might point to the same memory, the compiler is not allowed to optimize this into a single qword move (each line might change the value of x).

So far so good.

But if you pass x by value instead, this argument does not longer hold. And indeed, GCC optimizes this to a simple mov instruction, as expected: https://godbolt.org/z/iYj1or

However, clang does not: https://godbolt.org/z/Hgg5z9

I'm assuming that, as it is not even guaranteed that x occupies any stack memory at all, any attempt to make dest point to x before the function is called would result in undefined behavior and thus the compiler can assume that this just never happens. That would mean clang is missing some opportunity here. But I'm not sure. Can somebody clarify?

842

asked May 13 '19 11:05

sebrockm

1 Answers

The code you've given is way overcomplicated. You can replace it with:

void to_bytes(uint64_t x, uint8_t* dest) {
    x = htole64(x);
    std::memcpy(dest, &x, sizeof(x));
}

Yes, this uses the Linux-ism htole64(), but if you're on another platform you can easily reimplement that.

Clang and GCC optimize this perfectly, on both little- and big-endian platforms.

157

answered Oct 12 '22 23:10

John Zwinck

Related questions
                            
                                Transitioning away from std::string, std::ostream, etc. in a library's public API
                            
                                How to define a feature that will be implemented in next version of the Standard library?
                            
                                How do I define an out-of-line class template member function with a non-trailing decltype return type
                            
                                Forcibly terminate method after a certain amount of time
                            
                                = NULL, and non-static data member initializing in c++98
                            
                                constexpression in derived class, clang vs rest
                            
                                clang-format force each argument / parameter to own line when column exceeded?
                            
                                Why does this dynamic_cast from Objective-C++ succeed in debug but fail in release?
                            
                                error: no matching function for call to ‘std::vector<std::__cxx11::basic_string<char> >::push_back(int&)’
                            
                                Can't detect T::value() using auto to parametrize true_type
                            
                                C++ list initialization allows multiple user-defined conversions
                            
                                will the padding of base class be copied into the derived class?
                            
                                Why isn't -mmacosx-version-min=10.10 preventing use of a function tagged as starting in 10.11?
                            
                                std::string class inheritance and tedious c++ overload resolution
                            
                                How can I determine the current size of the file opened by std::ofstream?
                            
                                Bulk memory free of fragmented stl containers
                            
                                How to find by a const pointer key in a map with non-const pointer keys
                            
                                partial lookup in key-value map where key itself is a key-value map
                            
                                C++ on Jupyter Notebook for Windows
                            
                                why does `boost::lower_bound` take its argument by value?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a reason why Clang does not optimize this code?

Tags:

c++

optimization

clang

sebrockm

People also ask

1 Answers

John Zwinck

Recent Activity

Donate For Us