Optimizing sqrt(n) - sqrt(n-1)

Question

Here is function that I call many times per second:

static inline double calculate_scale(double n) { //n may be int or double
    return sqrt(n) - sqrt(n-1);
}

Called in loop like:

for(double i = 0; i < x; i++) {
    double scale = calculate_scale(i);
    ...
}

And it's so slow. What is the best way to optimize this function to get as accurate output as possible?

Parameter n: Starting from 1 up, practically not limited, but mainly used with small numbers in range 1-10. It's integer (whole number), but it may be both int or double, depending on what performs better.

StoryTeller - Unslander Monica · Accepted Answer

You can try to replace it with the following approximation

sqrt(n) - sqrt(n-1) == 
(sqrt(n) - sqrt(n-1)) * (sqrt(n) + sqrt(n-1)) / (sqrt(n) + sqrt(n-1)) ==
(n - (n + 1)) / (sqrt(n) + sqrt(n-1)) ==
1 / (sqrt(n) + sqrt(n-1))

For large enough n, the last equation is pretty close to 1 / (2 * sqrt(n)). So you only have to call sqrt once. It's also worth noting that even without the approximation, the last expression is more numerically stable in terms of relative error for larger n.

ttdado · Answer

First of all, thanks for all suggestions. I've done some research and found some interesting implementations and facts.

1. In Loop or Using Precomputed table

(thanks @Ulysse BN) You can optimize loop by simply saving previous sqrt(n) value. Following example demonstrates this optimization used to setup precomputed table.

    /**
     * Init variables
     *      i       counter
     *      x       number of cycles (size of table)
     *      sqrtI1  previous square root = sqrt(i-1)
     *      ptr     Pointer for next value
     */
    double i, x = sizeof(precomputed_table) / sizeof(double);
    double sqrtI1 = 0;

    double* ptr = (double*) precomputed_table;

    /**
     * Optimized calculation
     * In short:
     *      scale = sqrt(i) - sqrt(i-1)
     */
    for(i = 1; i <= x; i++) {
        double sqrtI = sqrt(i);
        double scale = sqrtI - sqrtI1; 
        *ptr++ = scale;
        sqrtI1 = sqrtI;
    }

Using precomputed table is probably the fastest method, but it's drawback may be that it's size is limited.

static inline double calculate_scale(int n) {
    return precomputed_table[n-1];
}

2. Approximation For BIG numbers using Inverse Square Root

Required Inverse (reciprocal) Square Root function rsqrt

This method has most accurate results with big numbers. With small numbers there are errors:

1    2     3      10       100     1000
0.29 0.006 0.0016 0.000056 1.58e-7 4.95e-10

Here is JS code that I used to calculate results above:

function sqrt(x) { return Math.sqrt(x); } function d(x) { return (sqrt(x)-sqrt(x-1))-(0.5/sqrt(x-0.5));} console.log(d(1), d(2), d(3), d(10), d(100), d(1000));

You can also see accuracy compared with two-sqrt version in single graph: https://www.google.com/search?q=(sqrt(x)-sqrt(x-1))-(0.5%2Fsqrt(x-0.5))

Usage:

static inline double calculate_scale(double n) {
    //Same as: 0.5 / sqrt(n-0.5)
    //but lot faster
    return 0.5 * rsqrt(n-0.5);
}

On some older cpus (with slow or no hardware square root) you may go even faster using floats and Fast inverse square root from Quake:

static inline float calculate_scale(float n) {
    return 0.5 * Q_rsqrt(n-0.5);
}

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

For more info about implementation, see https://en.wikipedia.org/wiki/Fast_inverse_square_root and http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf . Not recommended to use on modern cpus with hardware reciprocal square root.

Not always solution: 0.5 / sqrt(n-0.5)

Please note that on some processors (eg. ARM Cortex A9, Intel Core2) division takes nearly same time as hardware square root, so it's best to use original function with 2 square roots sqrt(n) - sqrt(n-1) OR reciprocal square root with multiply instead 0.5 * rsqrt(n-0.5) if exist.

3. Using Precomputed table with fallback

This method is good compromise between first 2 solutions. It has both good accuracy and performance.

static inline double calculate_scale(double n) {
    if(n <= sizeof_precomputed_table) {
        int nIndex = (int) n;
        return precomputed_table[nIndex-1];
    }
    //Multiply + Inverse Square root
    return 0.5 * rsqrt(n-0.5);
    //OR
    return sqrt(n) - sqrt(n-1);
}

In my case I need really accurate numbers, so my precomputed table size is 2048.

Any feedback is welcomed.

Optimizing sqrt(n) - sqrt(n-1)

Tags:

c

optimization

mathematical-optimization

ttdado

2 Answers

StoryTeller - Unslander Monica

1. In Loop or Using Precomputed table

2. Approximation For BIG numbers using Inverse Square Root

3. Using Precomputed table with fallback

ttdado

Recent Activity

Donate For Us

Optimizing sqrt(n) - sqrt(n-1)

Tags:

c

optimization

mathematical-optimization

ttdado

2 Answers

StoryTeller - Unslander Monica

1. In Loop or Using Precomputed table

2. Approximation For BIG numbers using Inverse Square Root

3. Using Precomputed table with fallback

ttdado

Related questions

Recent Activity

Donate For Us