I need to perform some integer divisions in the hot path of my code. I've already determined via profiling and cycle counting that the integer divisions are costing me. I'm hoping there's something I can do to strength reduce the divisions into something cheaper.
In this path, I am dividing by 2^n+1, where n is variable. Essentially I want to optimize this function to remove the division operator:
unsigned long compute(unsigned long a, unsigned int n)
{
return a / ((1 << n) + 1);
}
If I were dividing by 2^n, I would just replace the div with a shift-right by n. If I were dividing by a constant, I would let the compiler strength reduce that specific division, likely turning it into a multiply and some shifts.
Is there a similar optimization that applies to 2^n+1?
Edit: a here can be an arbitrary 64-bit integer. n takes only a few values between 10 and, say, 25. I can certainly precompute some values for each n, but not for a.
Since you can only shift an int
so many places, you can put all those cases into a choice of one of several divisions by a constant:
unsigned long compute(unsigned long a, unsigned int n)
{
// assuming a 32-bit architecture (making this work for 64-bits
// is left as an exercise for the reader):
switch (n) {
case 0: return a / ((1 << 0) + 1);
case 1: return a / ((1 << 1) + 1);
case 2: return a / ((1 << 2) + 1);
// cases 3 through 30...
case 31: return a / ((1 << 31) + 1);
}
}
So now each division is by a constant, which compilers will typically reduce to a series of multiply/shift/add instructions (as the question mentioned). See Does a c/c++ compiler optimize constant divisions by power-of-two value into shifts? for deatils.
You can replace integer division by a constant, by multiplication (modulo wordsize) with a magic number and a shift.
The magic numbers can be pre-calculated for known constants.
Since n can't take many values e.g. 0..31 it is "easy" to pre-calculate these magic numbers for all n and store it in a table with 32 elements.
Javascript Page for calculating the magic numbers
A good compiler can compute the magic numbers and replace integer division by multiplication and shift if the divisor is constant at compile time. Depending on how the rest of the code is structured around the performance critical code you could use macro or inline tricks to unroll for all possible values of n and let the compiler do the work of finding the magic numbers (similar to the answer with the switch, but I would put more code in the constant region otherwise it might be a tradeof not worth it -- branching can cost you performance also)
Detailed description together with code for calculating the magic numbers can be fund in the Book "Hackers Delight" by Henry S. Warren, Jr. (highly recommended must have book!) pp. 180ff.
Link to Google Books for the relevant chapter:
Chapter 10-9 Unsigned Division by Divisors >= 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With