Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fast integer square root approximation

Tags:

c

square-root

I am currently searching for a very fast integer square root approximation, where floor(sqrt(x)) <= veryFastIntegerSquareRoot(x) <= x

The square root routine is used for calculating prime numbers, which get considerably faster if only values below or equals sqrt(x) are checked for being a divisor of x.

What I am currently having is this function from Wikipedia, adjusted a small bit to work with 64-bit integers.

Because I have no other function to compare against (or more precise, the function is too precise for my purposes, and it probably takes more time, than being higher than the actual result.)

like image 288
Morten Avatar asked Dec 09 '15 19:12

Morten


2 Answers

Loopfree/jumpfree (well: almost ;-) Newton-Raphson:

/* static will allow inlining */
static unsigned usqrt4(unsigned val) {
    unsigned a, b;

    if (val < 2) return val; /* avoid div/0 */

    a = 1255;       /* starting point is relatively unimportant */

    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;
    b = val / a; a = (a+b) /2;

    return a;
}

For 64-bit ints you will need a few more steps (my guess: 6)

like image 111
wildplasser Avatar answered Nov 07 '22 03:11

wildplasser


Computing floor(sqrt(x)) Exactly

Here is my solution, which is based on the bit-guessing approach proposed on Wikipedia. Unfortunately the pseudo-code provided on Wikipedia has some errors so I had to make some adjustments:

unsigned char bit_width(unsigned long long x) {
    return x == 0 ? 1 : 64 - __builtin_clzll(x);
}

// implementation for all unsigned integer types
unsigned sqrt(const unsigned n) {
    unsigned char shift = bit_width(n);
    shift += shift & 1; // round up to next multiple of 2

    unsigned result = 0;

    do {
        shift -= 2;
        result <<= 1; // leftshift the result to make the next guess
        result |= 1;  // guess that the next bit is 1
        result ^= result * result > (n >> shift); // revert if guess too high
    } while (shift != 0);

    return result;
}

bit_width can be evaluated in constant time and the loop will iterate at most ceil(bit_width / 2) times. So even for a 64-bit integer, this will be at worst 32 iterations of basic arithmetic and bitwise operations.

Unlike all other answers proposed so far, this actually gives you the best possible approximation, which is floor(sqrt(x)). For any x2, this will return x exactly.

Making a Guess using log2(x)

If this is still too slow for you, you can make a guess solely based on the binary logarithm. The basic idea is that we can compute the sqrt of any number 2x using 2x/2. x/2 might have a remainder, so we can't always compute this exactly, but we can compute an upper and lower bound.

For example:

  1. we are given 25
  2. compute floor(log_2(25)) = 4
  3. compute ceil(log_2(25)) = 5
  4. lower bound: pow(2, floor(4 / 2)) = 4
  5. upper bound: pow(2, ceil(5 / 2)) = 8

And indeed, the actual sqrt(25) = 5. We found sqrt(16) >= 4 and sqrt(32) <= 8. This means:

4 <= sqrt(16) <= sqrt(25) <= sqrt(32) <= 8
            4 <= sqrt(25) <= 8

This is how we can implement these guesses, which we will call sqrt_lo and sqrt_hi.

// this function computes a lower bound
unsigned sqrt_lo(const unsigned n) noexcept
{
    unsigned log2floor = bit_width(n) - 1;
    return (unsigned) (n != 0) << (log2floor >> 1);
}

// this function computes an upper bound
unsigned sqrt_hi(const unsigned n)
{
    bool isnt_pow2 = ((n - 1) & n) != 0; // check if n is a power of 2
    unsigned log2ceil = bit_width(n) - 1 + isnt_pow2;
    log2ceil += log2ceil & 1; // round up to multiple of 2
    return (unsigned) (n != 0) << (log2ceil >> 1);
}

For these two functions, the following statement is always true:

sqrt_lo(x) <= floor(sqrt(x)) <= sqrt(x) <= sqrt_hi(x) <= x

Note that if we assume that the input is never zero, then (unsigned) (n != 0) can be simplified to 1 and the statement is still true.

These functions can be evaluated in O(1) with a hardware-__builtin_clzll instruction. They only give precise results for numbers 22x, so 256, 64, 16, etc.

like image 2
Jan Schultke Avatar answered Nov 07 '22 02:11

Jan Schultke