Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:

newIndex = nowIndex + 1 + (localChildIndex*X)

x would be for example: X = 45 + 44 + 43 + 42 +40.

Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?

It would be very bad if it would need a loop to calculate X (performancewise) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.

Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.

I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.

like image 676
user1610743 Avatar asked Jan 03 '14 16:01

user1610743


1 Answers

I'll answer this in increasing complexity and generality.

  • If x is fixed to 16 then just use a constant value 1118481. Hooray! (Name it, using magical numbers is bad practice)

  • If you have a few cases known at compile time use a few constants or even defines, for example:

    #define X_2 63
    #define X_4 1365
    #define X_8 37449
    #define X_16 1118481
    ...
    
  • If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.

    int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
    

    Initialize it and then just access with the known exponent of 2^exp at execution time.

    newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
    
  • How are these values precalculated, or how to calculate them efficiently on the fly: The sum X = x^n + x^(n - 1) + ... + x^1 + x^0 is a geometric serie and its finite sum is:

    X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
    
  • About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:

    Let exp be the exponent so that x = 2^exp. (For 16, exp = 4). Then,

    X = x^5 + ... + x^1 + x^0
    X = (2^exp)^5 + ... + (2^exp)^1 + 1
    X = 2^(exp*5) + ... + 2^(exp*1) + 1
    

    now using bitwise, 2^n = 1<<n

    X = 1<<(exp*5) | ... | 1<<exp | 1
    

    In C:

    int X;
    int exp = 4; //for x == 16
    X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
    
  • And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0 in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.

like image 53
DSquare Avatar answered Sep 21 '22 17:09

DSquare