Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

Question

For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:

newIndex = nowIndex + 1 + (localChildIndex*X)

x would be for example: X = 4⁵ + 4⁴ + 4³ + 4² +4⁰.

Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?

It would be very bad if it would need a loop to calculate X (performancewise) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.

Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.

I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.

DSquare · Accepted Answer

I'll answer this in increasing complexity and generality.

If x is fixed to 16 then just use a constant value 1118481. Hooray! (Name it, using magical numbers is bad practice)
If you have a few cases known at compile time use a few constants or even defines, for example:
```
#define X_2 63
#define X_4 1365
#define X_8 37449
#define X_16 1118481
...
```
If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.
```
int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
```
Initialize it and then just access with the known exponent of 2^exp at execution time.
```
newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
```
How are these values precalculated, or how to calculate them efficiently on the fly: The sum X = x^n + x^(n - 1) + ... + x^1 + x^0 is a geometric serie and its finite sum is:
```
X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
```
About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:

Let exp be the exponent so that x = 2^exp. (For 16, exp = 4). Then,
```
X = x^5 + ... + x^1 + x^0
X = (2^exp)^5 + ... + (2^exp)^1 + 1
X = 2^(exp*5) + ... + 2^(exp*1) + 1
```
now using bitwise, 2^n = 1<<n
```
X = 1<<(exp*5) | ... | 1<<exp | 1
```
In C:
```
int X;
int exp = 4; //for x == 16
X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
```
And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0 in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

Tags:

c

optimization

math

assembly

hpc

user1610743

1 Answers

DSquare

Recent Activity

Donate For Us

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

Tags:

c

optimization

math

assembly

hpc

user1610743

1 Answers

DSquare

Related questions

Recent Activity

Donate For Us