From what I understood about strict aliasing rule, this code for fast inverse square root will result in undefined behavior in C++:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // type punning
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) );
return y;
}
Does this code indeed cause UB? If yes, how can it be reimplemented in standard compliant way? If not, why not?
Assumptions: before calling this function we have somehow checked that floats are in IEEE 754 32-bit format, sizeof(long)==sizeof(float)
and the platform is little-endian.
The standard compliant way would be std::memcpy
. This should be sufficiently standard-compliant under your specified assumptions. Any reasonable compiler will turn that into a bunch of register moves if possible anyway. Furthermore, we can also alleviate (or at least check) some of your made assumptions using C++11's static_assert
and fixed width integer types from <cstdint>
. Endianness is irrelevant anyway, since we're not working on any arrays here, if an integer type is little-endian, a floating point type is also.
float Q_rsqrt( float number )
{
static_assert(std::numeric_limits<float>::is_iec559,
"fast inverse square root requires IEEE-comliant 'float'");
static_assert(sizeof(float)==sizeof(std::uint32_t),
"fast inverse square root requires 'float' to be 32-bit");
float x2 = number * 0.5F, y = number;
std::uint32_t i;
std::memcpy(&i, &y, sizeof(float));
i = 0x5f3759df - ( i >> 1 );
std::memcpy(&y, &i, sizeof(float));
return y * ( 1.5F - ( x2 * y * y ) );
}
You should use memcpy
. AFAIK this is the only standard compliant way and compilers are smart enough to replace the call with a single word move instruction. See this question for reasoning behind these claims.
I don't think you can do this without UB in the strict standardese sense, simply because it relies on things like a particular value representation of float
and long
. The standard does not specify these (on purpose), and to give implementations the freedom to do what they see fit in this regard, specifically applies the "UB" label to all behaviour which would depend on such things.
That doesn't mean this will be a "Heisenbug" or "nasal demons" type of UB; most likely, the implementation will provide definitions for this behaviour. As long as these definitions satisfy the preconditions of the code, it's fine. The preconditions here being:
sizeof(float) == sizeof(long)
.float
is such that if the bits are interpreted as a long
, 0x5f3759df
and >> 1
have the meaning reuqired for this hack to function.However, it is impossible to re-write this in a fully portable way (i.e. without UB) - what would you do about platforms which use a different implementation of floating-point arithmetic, for example. As long as you're relying a particular platform's specifics, you're relying on behaviour which is undefined in the standard, but defined by the platform.
Not. The algorithm starts out by assuming IEEE754 layout, which isn't a valid Standard-compliant assumption. The particular constant 0x5f3759df
would be utterly wrong for any other layout. Even the assumption that such a constant exists is flawed. I think it breaks as soon as you'd reverse the order of the fields inside a float, for instance.
I'm also not sure if Q_rsqrt(-0.0f)
works correctly even assuming IEEE754. Both zeroes (positive and negative) should be equal and sqrt(x)
is only undefined for x<-0.0f
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With