For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline
is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:
newIndex = nowIndex + 1 + (localChildIndex*X)
x would be for example: X = 45 + 44 + 43 + 42 +40.
Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?
It would be very bad if it would need a loop to calculate X (performancewise
) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.
Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.
I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.
I'll answer this in increasing complexity and generality.
If x is fixed to 16 then just use a constant value 1118481
. Hooray! (Name it, using magical numbers is bad practice)
If you have a few cases known at compile time use a few constants or even defines, for example:
#define X_2 63
#define X_4 1365
#define X_8 37449
#define X_16 1118481
...
If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.
int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
Initialize it and then just access with the known exponent of 2^exp at execution time.
newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
How are these values precalculated, or how to calculate them efficiently on the fly:
The sum X = x^n + x^(n - 1) + ... + x^1 + x^0
is a geometric serie and its finite sum is:
X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n
is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:
Let exp
be the exponent so that x = 2^exp. (For 16, exp = 4). Then,
X = x^5 + ... + x^1 + x^0
X = (2^exp)^5 + ... + (2^exp)^1 + 1
X = 2^(exp*5) + ... + 2^(exp*1) + 1
now using bitwise, 2^n = 1<<n
X = 1<<(exp*5) | ... | 1<<exp | 1
In C:
int X;
int exp = 4; //for x == 16
X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0
in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With