As we know if n
is not a perfect square, then sqrt(n)
would not be an integer. Since I need only the integer part, I feel that calling sqrt(n)
wouldn't be that fast, as it takes time to calculate the fractional part also.
So my question is,
Can we get only the integer part of sqrt(n) without calculating the actual value of sqrt(n)
? The algorithm should be faster than sqrt(n)
(defined in <math.h>
or <cmath>
)?
If possible, you can write the code in asm
block also.
C++ sqrt() In this tutorial, we will learn about the sqrt() function in C++ with the help of examples. The sqrt() function in C++ returns the square root of a number. This function is defined in the cmath header file.
√−1=i is an imaginary number, not lying anywhere on the real number line. Therefore as others have said, it is neither rational nor irrational in the usual senses of those words.
I would try the Fast Inverse Square Root trick.
It's a way to get a very good approximation of 1/sqrt(n)
without any branch, based on some bit-twiddling so not portable (notably between 32-bits and 64-bits platforms).
Once you get it, you just need to inverse the result, and takes the integer part.
There might be faster tricks, of course, since this one is a bit of a round about.
EDIT: let's do it!
First a little helper:
// benchmark.h #include <sys/time.h> template <typename Func> double benchmark(Func f, size_t iterations) { f(); timeval a, b; gettimeofday(&a, 0); for (; iterations --> 0;) { f(); } gettimeofday(&b, 0); return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) - (a.tv_sec * (unsigned int)1e6 + a.tv_usec); }
Then the main body:
#include <iostream> #include <cmath> #include "benchmark.h" class Sqrt { public: Sqrt(int n): _number(n) {} int operator()() const { double d = _number; return static_cast<int>(std::sqrt(d) + 0.5); } private: int _number; }; // http://www.codecodex.com/wiki/Calculate_an_integer_square_root class IntSqrt { public: IntSqrt(int n): _number(n) {} int operator()() const { int remainder = _number; if (remainder < 0) { return 0; } int place = 1 <<(sizeof(int)*8 -2); while (place > remainder) { place /= 4; } int root = 0; while (place) { if (remainder >= root + place) { remainder -= root + place; root += place*2; } root /= 2; place /= 4; } return root; } private: int _number; }; // http://en.wikipedia.org/wiki/Fast_inverse_square_root class FastSqrt { public: FastSqrt(int n): _number(n) {} int operator()() const { float number = _number; float x2 = number * 0.5F; float y = number; long i = *(long*)&y; //i = (long)0x5fe6ec85e7de30da - (i >> 1); i = 0x5f3759df - (i >> 1); y = *(float*)&i; y = y * (1.5F - (x2*y*y)); y = y * (1.5F - (x2*y*y)); // let's be precise return static_cast<int>(1/y + 0.5f); } private: int _number; }; int main(int argc, char* argv[]) { if (argc != 3) { std::cerr << "Usage: %prog integer iterations\n"; return 1; } int n = atoi(argv[1]); int it = atoi(argv[2]); assert(Sqrt(n)() == IntSqrt(n)() && Sqrt(n)() == FastSqrt(n)() && "Different Roots!"); std::cout << "sqrt(" << n << ") = " << Sqrt(n)() << "\n"; double time = benchmark(Sqrt(n), it); double intTime = benchmark(IntSqrt(n), it); double fastTime = benchmark(FastSqrt(n), it); std::cout << "Number iterations: " << it << "\n" "Sqrt computation : " << time << "\n" "Int computation : " << intTime << "\n" "Fast computation : " << fastTime << "\n"; return 0; }
And the results:
sqrt(82) = 9 Number iterations: 4096 Sqrt computation : 56 Int computation : 217 Fast computation : 119 // Note had to tweak the program here as Int here returns -1 :/ sqrt(2147483647) = 46341 // real answer sqrt(2 147 483 647) = 46 340.95 Number iterations: 4096 Sqrt computation : 57 Int computation : 313 Fast computation : 119
Where as expected the Fast computation performs much better than the Int computation.
Oh, and by the way, sqrt
is faster :)
(int) sqrt(i)
After profiling with proper settings (-march=native -m64 -O3
) the above was a lot faster.
Alright, a bit old question, but the "fastest" answer has not been given yet. The fastest (I think) is the Binary Square Root algorithm, explained fully in this Embedded.com article.
It basicly comes down to this:
unsigned short isqrt(unsigned long a) { unsigned long rem = 0; int root = 0; int i; for (i = 0; i < 16; i++) { root <<= 1; rem <<= 2; rem += a >> 30; a <<= 2; if (root < rem) { root++; rem -= root; root++; } } return (unsigned short) (root >> 1); }
On my machine (Q6600, Ubuntu 10.10) I profiled by taking the square root of the numbers 1-100000000. Using iqsrt(i)
took 2750 ms. Using (unsigned short) sqrt((float) i)
took 3600ms. This was done using g++ -O3
. Using the -ffast-math
compile option the times were 2100ms and 3100ms respectively. Note this is without using even a single line of assembler so it could probably still be much faster.
The above code works for both C and C++ and with minor syntax changes also for Java.
What works even better for a limited range is a binary search. On my machine this blows the version above out of the water by a factor 4. Sadly it's very limited in range:
#include <stdint.h> const uint16_t squares[] = { 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000, 10201, 10404, 10609, 10816, 11025, 11236, 11449, 11664, 11881, 12100, 12321, 12544, 12769, 12996, 13225, 13456, 13689, 13924, 14161, 14400, 14641, 14884, 15129, 15376, 15625, 15876, 16129, 16384, 16641, 16900, 17161, 17424, 17689, 17956, 18225, 18496, 18769, 19044, 19321, 19600, 19881, 20164, 20449, 20736, 21025, 21316, 21609, 21904, 22201, 22500, 22801, 23104, 23409, 23716, 24025, 24336, 24649, 24964, 25281, 25600, 25921, 26244, 26569, 26896, 27225, 27556, 27889, 28224, 28561, 28900, 29241, 29584, 29929, 30276, 30625, 30976, 31329, 31684, 32041, 32400, 32761, 33124, 33489, 33856, 34225, 34596, 34969, 35344, 35721, 36100, 36481, 36864, 37249, 37636, 38025, 38416, 38809, 39204, 39601, 40000, 40401, 40804, 41209, 41616, 42025, 42436, 42849, 43264, 43681, 44100, 44521, 44944, 45369, 45796, 46225, 46656, 47089, 47524, 47961, 48400, 48841, 49284, 49729, 50176, 50625, 51076, 51529, 51984, 52441, 52900, 53361, 53824, 54289, 54756, 55225, 55696, 56169, 56644, 57121, 57600, 58081, 58564, 59049, 59536, 60025, 60516, 61009, 61504, 62001, 62500, 63001, 63504, 64009, 64516, 65025 }; inline int isqrt(uint16_t x) { const uint16_t *p = squares; if (p[128] <= x) p += 128; if (p[ 64] <= x) p += 64; if (p[ 32] <= x) p += 32; if (p[ 16] <= x) p += 16; if (p[ 8] <= x) p += 8; if (p[ 4] <= x) p += 4; if (p[ 2] <= x) p += 2; if (p[ 1] <= x) p += 1; return p - squares; }
A 32 bit version can be downloaded here: https://gist.github.com/3481770
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With