I'm looking for fast code for 64-bit (unsigned) cube roots. (I'm using C and compiling with gcc, but I imagine most of the work required will be language- and compiler-agnostic.) I will denote by ulong a 64-bit unisgned integer.
Given an input n I require the (integral) return value r to be such that
r * r * r <= n && n < (r + 1) * (r + 1) * (r + 1)
That is, I want the cube root of n, rounded down. Basic code like
return (ulong)pow(n, 1.0/3);
is incorrect because of rounding toward the end of the range. Unsophisticated code like
ulong
cuberoot(ulong n)
{
ulong ret = pow(n + 0.5, 1.0/3);
if (n < 100000000000001ULL)
return ret;
if (n >= 18446724184312856125ULL)
return 2642245ULL;
if (ret * ret * ret > n) {
ret--;
while (ret * ret * ret > n)
ret--;
return ret;
}
while ((ret + 1) * (ret + 1) * (ret + 1) <= n)
ret++;
return ret;
}
gives the correct result, but is slower than it needs to be.
This code is for a math library and it will be called many times from various functions. Speed is important, but you can't count on a warm cache (so suggestions like a 2,642,245-entry binary search are right out).
For comparison, here is code that correctly calculates the integer square root.
ulong squareroot(ulong a) {
ulong x = (ulong)sqrt((double)a);
if (x > 0xFFFFFFFF || x*x > a)
x--;
return x;
}
The book "Hacker's Delight" has algorithms for this and many other problems. The code is online here. EDIT: That code doesn't work properly with 64-bit ints, and the instructions in the book on how to fix it for 64-bit are somewhat confusing. A proper 64-bit implementation (including test case) is online here.
I doubt that your squareroot
function works "correctly" - it should be ulong a
for the argument, not n
:) (but the same approach would work using cbrt
instead of sqrt
, although not all C math libraries have cube root functions).
I've adapted the algorithm presented in 1.5.2
(the kth root) in Modern Computer Arithmetic (Brent and Zimmerman). For the case of (k == 3)
, and given a 'relatively' accurate over-estimate of the initial guess - this algorithm seems to out-perform the 'Hacker's Delight' code above.
Not only that, but MCA as a text provides theoretical background as well as a proof of correctness and terminating criteria.
Provided that we can produce a 'relatively' good initial over-estimate, I haven't been able to find a case that exceeds (7) iterations. (Is this effectively related to 64-bit values having 2^6 bits?) Either way, it's an improvement over the (21) iterations in the HacDel code - with linear O(b) convergence, despite having a loop body that is evidently much faster.
The initial estimate I've used is based on a 'rounding up' of the number of significant bits in the value (x). Given (b) significant bits in (x), we can say: 2^(b - 1) <= x < 2^b
. I state without proof (though it should be relatively easy to demonstrate) that: 2^ceil(b / 3) > x^(1/3)
static inline uint32_t u64_cbrt (uint64_t x)
{
uint64_t r0 = 1, r1;
/* IEEE-754 cbrt *may* not be exact. */
if (x == 0) /* cbrt(0) : */
return (0);
int b = (64) - __builtin_clzll(x);
r0 <<= (b + 2) / 3; /* ceil(b / 3) */
do /* quadratic convergence: */
{
r1 = r0;
r0 = (2 * r1 + x / (r1 * r1)) / 3;
}
while (r0 < r1);
return ((uint32_t) r1); /* floor(cbrt(x)); */
}
A crbt
call probably isn't all that useful - unlike the sqrt
call which can be efficiently implemented on modern hardware. That said, I've seen gains for sets of values under 2^53
(exactly represented in IEEE-754 doubles), which surprised me.
The only downside is the division by: (r * r)
- this can be slow, as the latency of integer division continues to fall behind other advances in ALUs. The division by a constant: (3)
is handled by reciprocal methods on any modern optimising compiler.
It's interesting that Intel's 'Icelake' microarchitecture will significantly improve integer division - an operation that seems to have been neglected for a long time. I simply won't trust the 'Hacker's Delight' answer until I can find a sound theoretical basis for it. And then I have to work out which variant is the 'correct' answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With