Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Clang really this smart?

If I compile the following code with Clang 3.3 using -O3 -fno-vectorize I get the same assembly output even if I remove the commented line. The code type puns all possible 32-bit integers to floats and counts the ones in a [0, 1] range. Is Clang's optimizer actually smart enough to realize that 0xFFFFFFFF when punned to float is not in the range [0, 1], so ignore the second call to fn entirely? GCC produces different code when the second call is removed.

#include <limits>
#include <cstring>
#include <cstdint>

template <class TO, class FROM>
inline TO punning_cast(const FROM &input)
{
    TO out;
    std::memcpy(&out, &input, sizeof(TO));
    return out;
}

int main()
{
    uint32_t count = 0;

    auto fn = [&count] (uint32_t x) {
        float f = punning_cast<float>(x);
        if (f >= 0.0f && f <= 1.0f)
            count++;
    };

    for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
    {
        fn(i);
    }
    fn(std::numeric_limits<uint32_t>::max()); //removing this changes nothing

    return count;
}

See here: http://goo.gl/YZPw5i

like image 253
Chris_F Avatar asked May 29 '14 05:05

Chris_F


1 Answers

Yes, it looks like Clang really is this smart.

Test:

#include <limits>
#include <cstring>
#include <cstdint>

template <class TO, class FROM>
inline TO punning_cast(const FROM &input)
{
    TO out;
    std::memcpy(&out, &input, sizeof(TO));
    return out;
}

int main()
{
    uint32_t count = 0;

    auto fn = [&count] (uint32_t x) {
        float f = punning_cast<float>(x);
        if (f >= 0.0f && f <= 1.0f)
            count++;
    };

    for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
    {
        fn(i);
    }
#ifdef X
    fn(0x3f800000); /* 1.0f */
#endif

    return count;
}

Result:

$ c++ -S -DX -O3 foo.cpp -std=c++11 -o foo.s
$ c++ -S -O3 foo.cpp -std=c++11 -o foo2.s
$ diff foo.s foo2.s
100d99
<   incl    %eax

Observe that Clang has converted the call to fn(0x3f800000) into simply an increment instruction, since the value decodes to 1.0. This is correct.

My guess is that Clang is tracing the function calls because they only involve constants, and that Clang is capable of tracing memcpy through type-punning (probably by simply emulating its effect on the constant value).

like image 101
nneonneo Avatar answered Oct 18 '22 19:10

nneonneo