How does this sqrt approximation inline assembly function work?

Question

Reading through The Tricks of the 3D Game Programming Gurus, I came across this sort function written in inline assembly:

inline float FastSqrt(float Value)
{
    float Result;

    _asm
    {
        mov eax, Value
        sub eax, 0x3F800000
        sar eax, 1
        add eax, 0x3F800000
        mov Result, eax
    }

    return(Result);
}

It is an approximation of the actual square-root, but the accuracy is enough for my needs.

How does this actually work? What is this magical 0x3F800000 value? How are we achieving square root by subtracting, rotating and adding?

Here's how it looks in C/C++ code:

inline float FastSqrt_C(float Value)
{
    float Result;

    long Magic = *((long *)&Value);
    Magic -= 0x3F800000;
    Magic >>= 1;
    Magic += 0x3F800000;
    Result = *((float *)&Magic);

    return(Result);
}

Jester · Accepted Answer

Many people have pointed out that 0x3f800000 is the representation of 1.0. While this is true, it has nothing to do with how the calculation works. To understand it, you need to know how non-negative floats are stored. f = (1+m)*2^x, with 0 <= m < 1 and m being the mantissa, x the exponent. Also note that x is stored with a bias, so what's actually in the binary is x+127. The 32 bit value is made up from the sign bit (which is zero in our case) followed by 8 bits of exponent storing x+127 and finally by 23 bits of mantissa, m. (See the wikipedia article).

Apply some basic math,

sqrt(f) = sqrt((1+m)*2^x)
        = sqrt(1+m)*sqrt(2^x)
        = sqrt(1+m)*2^(x/2)

So, as a rough approximation, we need to halve the exponent but due to the bias we can't just do x/2 we need (x-127)/2 + 127. This 127 shifted to the appropriate bit position is the magic 0x3f800000.

The division by 2 is achieved using a right shift by one bit. Since this operates on the whole float, it has a side effect on the mantissa too.

First, assume the original exponent was even. Then the least significant bit that gets shifted out is zero. Thus, the mantissa gets halved too, so what we end up is: sqrt(f) = (1+m/2)*2^(x/2). We got the exponent correct, but the mantissa is (1+m/2) instead of sqrt(1+m). The maximum relative error for this is (1.5 - sqrt(2))/sqrt(2) ~ 6% which occurs if m is almost 1 meaning f is near, but less than an odd power of 2. Take for example f=7.99. The formula gives us about 2.998 instead of 2.827 which indeed has an error of 6%.

Now, if the exponent was odd, then the least significant bit will be 1 and this when shifted into the mantissa will cause an increase by half. Thus, we get sqrt(f) = (1.5+m/2)*2^((x-1)/2). The maximum error for this is actually when m=0, and that will be (1.5/sqrt(2)-sqrt(1))/sqrt(1) which is again around 6%. This occurs for numbers that are near to an odd power of two from above.

The two cases combined mean the worst inaccuracy is around 6% if the input value happens to be near an odd power of two. For even powers of two, the result is accurate.

How does this sqrt approximation inline assembly function work?

Tags:

optimization

x86

assembly

square-root

sar

vexe

1 Answers

Jester

Recent Activity

Donate For Us

How does this sqrt approximation inline assembly function work?

Tags:

optimization

x86

assembly

square-root

sar

vexe

1 Answers

Jester

Related questions

Recent Activity

Donate For Us