Here is function that I call many times per second:
static inline double calculate_scale(double n) { //n may be int or double
return sqrt(n) - sqrt(n-1);
}
Called in loop like:
for(double i = 0; i < x; i++) {
double scale = calculate_scale(i);
...
}
And it's so slow. What is the best way to optimize this function to get as accurate output as possible?
Parameter n: Starting from 1 up, practically not limited, but mainly used with small numbers in range 1-10. It's integer (whole number), but it may be both int or double, depending on what performs better.
You can try to replace it with the following approximation
sqrt(n) - sqrt(n-1) ==
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))
For large enough n, the last equation is pretty close to 1 / (2 * sqrt(n)). So you only have to call sqrt once. It's also worth noting that even without the approximation, the last expression is more numerically stable in terms of relative error for larger n.
First of all, thanks for all suggestions. I've done some research and found some interesting implementations and facts.
(thanks @Ulysse BN)
You can optimize loop by simply saving previous sqrt(n) value.
Following example demonstrates this optimization used to setup precomputed table.
/**
* Init variables
* i counter
* x number of cycles (size of table)
* sqrtI1 previous square root = sqrt(i-1)
* ptr Pointer for next value
*/
double i, x = sizeof(precomputed_table) / sizeof(double);
double sqrtI1 = 0;
double* ptr = (double*) precomputed_table;
/**
* Optimized calculation
* In short:
* scale = sqrt(i) - sqrt(i-1)
*/
for(i = 1; i <= x; i++) {
double sqrtI = sqrt(i);
double scale = sqrtI - sqrtI1;
*ptr++ = scale;
sqrtI1 = sqrtI;
}
Using precomputed table is probably the fastest method, but it's drawback may be that it's size is limited.
static inline double calculate_scale(int n) {
return precomputed_table[n-1];
}
Required Inverse (reciprocal) Square Root function rsqrt
This method has most accurate results with big numbers. With small numbers there are errors:
1 2 3 10 100 1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10
Here is JS code that I used to calculate results above:
function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));
You can also see accuracy compared with two-sqrt version in single graph: https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))
Usage:
static inline double calculate_scale(double n) {
//Same as: 0.5 / sqrt(n-0.5)
//but lot faster
return 0.5 * rsqrt(n-0.5);
}
On some older cpus (with slow or no hardware square root) you may go even faster using floats and Fast inverse square root from Quake:
static inline float calculate_scale(float n) {
return 0.5 * Q_rsqrt(n-0.5);
}
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
For more info about implementation, see https://en.wikipedia.org/wiki/Fast_inverse_square_root and http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf . Not recommended to use on modern cpus with hardware reciprocal square root.
Not always solution: 0.5 / sqrt(n-0.5)
Please note that on some processors (eg. ARM Cortex A9, Intel Core2)
division takes nearly same time as hardware square root,
so it's best to use original function with 2 square roots sqrt(n) - sqrt(n-1) OR
reciprocal square root with multiply instead 0.5 * rsqrt(n-0.5) if exist.
This method is good compromise between first 2 solutions. It has both good accuracy and performance.
static inline double calculate_scale(double n) {
if(n <= sizeof_precomputed_table) {
int nIndex = (int) n;
return precomputed_table[nIndex-1];
}
//Multiply + Inverse Square root
return 0.5 * rsqrt(n-0.5);
//OR
return sqrt(n) - sqrt(n-1);
}
In my case I need really accurate numbers, so my precomputed table size is 2048.
Any feedback is welcomed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With