Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement fast inverse sqrt without undefined behavior? [duplicate]

From what I understood about strict aliasing rule, this code for fast inverse square root will result in undefined behavior in C++:

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y; // type punning
    i  = 0x5f3759df - ( i >> 1 );
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );

    return y;
}

Does this code indeed cause UB? If yes, how can it be reimplemented in standard compliant way? If not, why not?

Assumptions: before calling this function we have somehow checked that floats are in IEEE 754 32-bit format, sizeof(long)==sizeof(float) and the platform is little-endian.

like image 629
Ruslan Avatar asked Jun 25 '14 09:06

Ruslan


4 Answers

The standard compliant way would be std::memcpy. This should be sufficiently standard-compliant under your specified assumptions. Any reasonable compiler will turn that into a bunch of register moves if possible anyway. Furthermore, we can also alleviate (or at least check) some of your made assumptions using C++11's static_assert and fixed width integer types from <cstdint>. Endianness is irrelevant anyway, since we're not working on any arrays here, if an integer type is little-endian, a floating point type is also.

float Q_rsqrt( float number )
{
    static_assert(std::numeric_limits<float>::is_iec559, 
                  "fast inverse square root requires IEEE-comliant 'float'");
    static_assert(sizeof(float)==sizeof(std::uint32_t), 
                  "fast inverse square root requires 'float' to be 32-bit");
    float x2 = number * 0.5F, y = number;
    std::uint32_t i;
    std::memcpy(&i, &y, sizeof(float));
    i  = 0x5f3759df - ( i >> 1 );
    std::memcpy(&y, &i, sizeof(float));
    return y * ( 1.5F - ( x2 * y * y ) );
}
like image 129
Christian Rau Avatar answered Nov 15 '22 00:11

Christian Rau


You should use memcpy. AFAIK this is the only standard compliant way and compilers are smart enough to replace the call with a single word move instruction. See this question for reasoning behind these claims.

like image 36
yuri kilochek Avatar answered Nov 15 '22 00:11

yuri kilochek


I don't think you can do this without UB in the strict standardese sense, simply because it relies on things like a particular value representation of float and long. The standard does not specify these (on purpose), and to give implementations the freedom to do what they see fit in this regard, specifically applies the "UB" label to all behaviour which would depend on such things.

That doesn't mean this will be a "Heisenbug" or "nasal demons" type of UB; most likely, the implementation will provide definitions for this behaviour. As long as these definitions satisfy the preconditions of the code, it's fine. The preconditions here being:

  • Type punning is allowed by the implementation.
  • sizeof(float) == sizeof(long).
  • The internal representation of float is such that if the bits are interpreted as a long, 0x5f3759df and >> 1 have the meaning reuqired for this hack to function.

However, it is impossible to re-write this in a fully portable way (i.e. without UB) - what would you do about platforms which use a different implementation of floating-point arithmetic, for example. As long as you're relying a particular platform's specifics, you're relying on behaviour which is undefined in the standard, but defined by the platform.

like image 4
Angew is no longer proud of SO Avatar answered Nov 15 '22 00:11

Angew is no longer proud of SO


Not. The algorithm starts out by assuming IEEE754 layout, which isn't a valid Standard-compliant assumption. The particular constant 0x5f3759df would be utterly wrong for any other layout. Even the assumption that such a constant exists is flawed. I think it breaks as soon as you'd reverse the order of the fields inside a float, for instance.

I'm also not sure if Q_rsqrt(-0.0f) works correctly even assuming IEEE754. Both zeroes (positive and negative) should be equal and sqrt(x) is only undefined for x<-0.0f

like image 3
MSalters Avatar answered Nov 14 '22 22:11

MSalters