Here is function that I call many times per second:
static inline double calculate_scale(double n) { //n may be int or double
return sqrt(n) - sqrt(n-1);
}
Called in loop like:
for(double i = 0; i < x; i++) {
double scale = calculate_scale(i);
...
}
And it's so slow. What is the best way to optimize this function to get as accurate output as possible?
Parameter n
: Starting from 1 up, practically not limited, but mainly used with small numbers in range 1-10. It's integer (whole number), but it may be both int
or double
, depending on what performs better.
You can try to replace it with the following approximation
sqrt(n) - sqrt(n-1) ==
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))
For large enough n
, the last equation is pretty close to 1 / (2 * sqrt(n))
. So you only have to call sqrt
once. It's also worth noting that even without the approximation, the last expression is more numerically stable in terms of relative error for larger n
.
First of all, thanks for all suggestions. I've done some research and found some interesting implementations and facts.
(thanks @Ulysse BN)
You can optimize loop by simply saving previous sqrt(n)
value.
Following example demonstrates this optimization used to setup precomputed table.
/**
* Init variables
* i counter
* x number of cycles (size of table)
* sqrtI1 previous square root = sqrt(i-1)
* ptr Pointer for next value
*/
double i, x = sizeof(precomputed_table) / sizeof(double);
double sqrtI1 = 0;
double* ptr = (double*) precomputed_table;
/**
* Optimized calculation
* In short:
* scale = sqrt(i) - sqrt(i-1)
*/
for(i = 1; i <= x; i++) {
double sqrtI = sqrt(i);
double scale = sqrtI - sqrtI1;
*ptr++ = scale;
sqrtI1 = sqrtI;
}
Using precomputed table is probably the fastest method, but it's drawback may be that it's size is limited.
static inline double calculate_scale(int n) {
return precomputed_table[n-1];
}
Required Inverse (reciprocal) Square Root function rsqrt
This method has most accurate results with big numbers. With small numbers there are errors:
1 2 3 10 100 1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10
Here is JS code that I used to calculate results above:
function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));
You can also see accuracy compared with two-sqrt version in single graph: https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))
Usage:
static inline double calculate_scale(double n) {
//Same as: 0.5 / sqrt(n-0.5)
//but lot faster
return 0.5 * rsqrt(n-0.5);
}
On some older cpus (with slow or no hardware square root) you may go even faster using float
s and Fast inverse square root from Quake:
static inline float calculate_scale(float n) {
return 0.5 * Q_rsqrt(n-0.5);
}
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
For more info about implementation, see https://en.wikipedia.org/wiki/Fast_inverse_square_root and http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf . Not recommended to use on modern cpus with hardware reciprocal square root.
Not always solution: 0.5 / sqrt(n-0.5)
Please note that on some processors (eg. ARM Cortex A9, Intel Core2)
division takes nearly same time as hardware square root,
so it's best to use original function with 2 square roots sqrt(n) - sqrt(n-1)
OR
reciprocal square root with multiply instead 0.5 * rsqrt(n-0.5)
if exist.
This method is good compromise between first 2 solutions. It has both good accuracy and performance.
static inline double calculate_scale(double n) {
if(n <= sizeof_precomputed_table) {
int nIndex = (int) n;
return precomputed_table[nIndex-1];
}
//Multiply + Inverse Square root
return 0.5 * rsqrt(n-0.5);
//OR
return sqrt(n) - sqrt(n-1);
}
In my case I need really accurate numbers, so my precomputed table size is 2048.
Any feedback is welcomed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With