Fastest way to get the integer part of sqrt(n)?

Tags:

As we know if n is not a perfect square, then sqrt(n) would not be an integer. Since I need only the integer part, I feel that calling sqrt(n) wouldn't be that fast, as it takes time to calculate the fractional part also.

So my question is,

Can we get only the integer part of sqrt(n) without calculating the actual value of sqrt(n)? The algorithm should be faster than sqrt(n) (defined in <math.h> or <cmath>)?

If possible, you can write the code in asm block also.

279

asked Feb 08 '11 06:02

Nawaz

2 Answers

I would try the Fast Inverse Square Root trick.

It's a way to get a very good approximation of 1/sqrt(n) without any branch, based on some bit-twiddling so not portable (notably between 32-bits and 64-bits platforms).

Once you get it, you just need to inverse the result, and takes the integer part.

There might be faster tricks, of course, since this one is a bit of a round about.

EDIT: let's do it!

First a little helper:

// benchmark.h #include <sys/time.h>  template <typename Func> double benchmark(Func f, size_t iterations) {   f();    timeval a, b;   gettimeofday(&a, 0);   for (; iterations --> 0;)   {     f();   }   gettimeofday(&b, 0);   return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -          (a.tv_sec * (unsigned int)1e6 + a.tv_usec); }

Then the main body:

#include <iostream>  #include <cmath>  #include "benchmark.h"  class Sqrt { public:   Sqrt(int n): _number(n) {}    int operator()() const   {     double d = _number;     return static_cast<int>(std::sqrt(d) + 0.5);   }  private:   int _number; };  // http://www.codecodex.com/wiki/Calculate_an_integer_square_root class IntSqrt { public:   IntSqrt(int n): _number(n) {}    int operator()() const    {     int remainder = _number;     if (remainder < 0) { return 0; }      int place = 1 <<(sizeof(int)*8 -2);      while (place > remainder) { place /= 4; }      int root = 0;     while (place)     {       if (remainder >= root + place)       {         remainder -= root + place;         root += place*2;       }       root /= 2;       place /= 4;     }     return root;   }  private:   int _number; };  // http://en.wikipedia.org/wiki/Fast_inverse_square_root class FastSqrt { public:   FastSqrt(int n): _number(n) {}    int operator()() const   {     float number = _number;      float x2 = number * 0.5F;     float y = number;     long i = *(long*)&y;     //i = (long)0x5fe6ec85e7de30da - (i >> 1);     i = 0x5f3759df - (i >> 1);     y = *(float*)&i;      y = y * (1.5F - (x2*y*y));     y = y * (1.5F - (x2*y*y)); // let's be precise      return static_cast<int>(1/y + 0.5f);   }  private:   int _number; };   int main(int argc, char* argv[]) {   if (argc != 3) {     std::cerr << "Usage: %prog integer iterations\n";     return 1;   }    int n = atoi(argv[1]);   int it = atoi(argv[2]);    assert(Sqrt(n)() == IntSqrt(n)() &&           Sqrt(n)() == FastSqrt(n)() && "Different Roots!");   std::cout << "sqrt(" << n << ") = " << Sqrt(n)() << "\n";    double time = benchmark(Sqrt(n), it);   double intTime = benchmark(IntSqrt(n), it);   double fastTime = benchmark(FastSqrt(n), it);    std::cout << "Number iterations: " << it << "\n"                "Sqrt computation : " << time << "\n"                "Int computation  : " << intTime << "\n"                "Fast computation : " << fastTime << "\n";    return 0; }

And the results:

sqrt(82) = 9 Number iterations: 4096 Sqrt computation : 56 Int computation  : 217 Fast computation : 119  // Note had to tweak the program here as Int here returns -1 :/ sqrt(2147483647) = 46341 // real answer sqrt(2 147 483 647) = 46 340.95 Number iterations: 4096 Sqrt computation : 57 Int computation  : 313 Fast computation : 119

Where as expected the Fast computation performs much better than the Int computation.

Oh, and by the way, sqrt is faster :)

answered Sep 19 '22 00:09

Matthieu M.

Edit: this answer is foolish - use `(int) sqrt(i)`

After profiling with proper settings (-march=native -m64 -O3) the above was a lot faster.

Alright, a bit old question, but the "fastest" answer has not been given yet. The fastest (I think) is the Binary Square Root algorithm, explained fully in this Embedded.com article.

It basicly comes down to this:

unsigned short isqrt(unsigned long a) {     unsigned long rem = 0;     int root = 0;     int i;      for (i = 0; i < 16; i++) {         root <<= 1;         rem <<= 2;         rem += a >> 30;         a <<= 2;          if (root < rem) {             root++;             rem -= root;             root++;         }     }      return (unsigned short) (root >> 1); }

On my machine (Q6600, Ubuntu 10.10) I profiled by taking the square root of the numbers 1-100000000. Using iqsrt(i) took 2750 ms. Using (unsigned short) sqrt((float) i) took 3600ms. This was done using g++ -O3. Using the -ffast-math compile option the times were 2100ms and 3100ms respectively. Note this is without using even a single line of assembler so it could probably still be much faster.

The above code works for both C and C++ and with minor syntax changes also for Java.

What works even better for a limited range is a binary search. On my machine this blows the version above out of the water by a factor 4. Sadly it's very limited in range:

#include <stdint.h>  const uint16_t squares[] = {     0, 1, 4, 9,     16, 25, 36, 49,     64, 81, 100, 121,     144, 169, 196, 225,     256, 289, 324, 361,     400, 441, 484, 529,     576, 625, 676, 729,     784, 841, 900, 961,     1024, 1089, 1156, 1225,     1296, 1369, 1444, 1521,     1600, 1681, 1764, 1849,     1936, 2025, 2116, 2209,     2304, 2401, 2500, 2601,     2704, 2809, 2916, 3025,     3136, 3249, 3364, 3481,     3600, 3721, 3844, 3969,     4096, 4225, 4356, 4489,     4624, 4761, 4900, 5041,     5184, 5329, 5476, 5625,     5776, 5929, 6084, 6241,     6400, 6561, 6724, 6889,     7056, 7225, 7396, 7569,     7744, 7921, 8100, 8281,     8464, 8649, 8836, 9025,     9216, 9409, 9604, 9801,     10000, 10201, 10404, 10609,     10816, 11025, 11236, 11449,     11664, 11881, 12100, 12321,     12544, 12769, 12996, 13225,     13456, 13689, 13924, 14161,     14400, 14641, 14884, 15129,     15376, 15625, 15876, 16129,     16384, 16641, 16900, 17161,     17424, 17689, 17956, 18225,     18496, 18769, 19044, 19321,     19600, 19881, 20164, 20449,     20736, 21025, 21316, 21609,     21904, 22201, 22500, 22801,     23104, 23409, 23716, 24025,     24336, 24649, 24964, 25281,     25600, 25921, 26244, 26569,     26896, 27225, 27556, 27889,     28224, 28561, 28900, 29241,     29584, 29929, 30276, 30625,     30976, 31329, 31684, 32041,     32400, 32761, 33124, 33489,     33856, 34225, 34596, 34969,     35344, 35721, 36100, 36481,     36864, 37249, 37636, 38025,     38416, 38809, 39204, 39601,     40000, 40401, 40804, 41209,     41616, 42025, 42436, 42849,     43264, 43681, 44100, 44521,     44944, 45369, 45796, 46225,     46656, 47089, 47524, 47961,     48400, 48841, 49284, 49729,     50176, 50625, 51076, 51529,     51984, 52441, 52900, 53361,     53824, 54289, 54756, 55225,     55696, 56169, 56644, 57121,     57600, 58081, 58564, 59049,     59536, 60025, 60516, 61009,     61504, 62001, 62500, 63001,     63504, 64009, 64516, 65025 };  inline int isqrt(uint16_t x) {     const uint16_t *p = squares;      if (p[128] <= x) p += 128;     if (p[ 64] <= x) p +=  64;     if (p[ 32] <= x) p +=  32;     if (p[ 16] <= x) p +=  16;     if (p[  8] <= x) p +=   8;     if (p[  4] <= x) p +=   4;     if (p[  2] <= x) p +=   2;     if (p[  1] <= x) p +=   1;      return p - squares; }

A 32 bit version can be downloaded here: https://gist.github.com/3481770

answered Sep 19 '22 00:09

orlp

Related questions
                            
                                Multithreading program stuck in optimized mode but runs normally in -O0
                            
                                Pointer to incomplete class type is not allowed
                            
                                How can I print out C++ map values?
                            
                                How to output a character as an integer through cout?
                            
                                How to use bitmask?
                            
                                Are 'new' and 'delete' getting deprecated in C++?
                            
                                invalid conversion from `void*' to `char*' when using malloc?
                            
                                Q_OBJECT throwing 'undefined reference to vtable' error [duplicate]
                            
                                C++ alignment when printing cout <<
                            
                                Performance hit from C++ style casts?
                            
                                C++ cast to derived class
                            
                                Is there a way to find all the functions exposed by a dll
                            
                                error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup
                            
                                If I want to round an unsigned integer to the closest smaller or equal even integer, can I divide by 2 then multiply by 2?
                            
                                Why override operator()?
                            
                                Error QApplication: no such file or directory
                            
                                cmath vs math.h (And similar c-prefixed vs .h extension headers)
                            
                                What is the recommended way to align memory in C++11
                            
                                Lambda functions as base classes
                            
                                "items list" or "item list"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest way to get the integer part of sqrt(n)?

Tags:

c++

performance

c

algorithm

math

Nawaz

People also ask

2 Answers

Matthieu M.

Edit: this answer is foolish - use `(int) sqrt(i)`

orlp

Recent Activity

Donate For Us

Fastest way to get the integer part of sqrt(n)?

Tags:

c++

performance

c

algorithm

math

Nawaz

People also ask

2 Answers

Matthieu M.

Edit: this answer is foolish - use (int) sqrt(i)

orlp

Related questions

Recent Activity

Donate For Us

Edit: this answer is foolish - use `(int) sqrt(i)`