fast square root optimization?

If you check this very nice page:

http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

You'll see this program:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{
  const float xhalf = 0.5f*x;

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0
  return x*u.x*(1.5f - xhalf*u.x*u.x);// Newton step, repeating increases accuracy 
}

My question is: Is there any particular reason why this isn't implemented as:

#define SQRT_MAGIC_F 0x5f3759df 
 float  sqrt2(const float x)
{

  union // get bits for floating value
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0

  const float xux = x*u.x;

  return xux*(1.5f - .5f*xux*u.x);// Newton step, repeating increases accuracy 
}

As, from disassembly, I see one MUL less. Is there any purpose to having xhalf appear at all?

How much faster is fast inverse square root?

Overview of the code The algorithm was approximately four times faster than computing the square root with another method and calculating the reciprocal via floating-point division.

Is fast inverse square root still used?

Many people have also asked if it's still useful on modern processors. The most common answer is no, that this hack was useful 20+ years ago and that modern-day processors have better and faster ways to approximate the inverse square root.

How accurate is fast inverse square root?

A single Newton-Raphson iteration is performed to calculate a more accurate approximation of the inverse square root of the input. The result of the Newton-Raphson iteration is the return value of the function. The result is extremely accurate with a maximum error of 0.175%.

It could be that legacy floating point math, which used 80 bit registers, was more accurate when the multipliers where linked together in the last line as intermediate results where kept in 80 bit registers.

The first multiplication in the upper implementation takes place in parallel to the integer math that follows, they use different execution resources. The second function on the other hand looks faster but it's hard to tell if it really is because of the above. Also, the const float xux = x*u.x; statement reduces the result back to 32 bit float, which may reduce overall accuracy.

You could test these functions head to head and compare them to the sqrt function in math.h (use double not float). This way you can see which is faster and which is more accurate.

fast square root optimization?

Tags:

c

optimization

user1095108

People also ask

1 Answers

egur

Recent Activity

Donate For Us

fast square root optimization?

Tags:

c

optimization

user1095108

People also ask

1 Answers

egur

Related questions

Recent Activity

Donate For Us