I recently encountered the following interview question:
How can you multiply a number by 7 in an efficient and optimized way?
I know that I can multiply by 8 (or left-shift by three bits) and then subtract the original value:
num = (num << 3) - num;
but are there any other solutions.
For a limited range, you can use a lookup table:
static unsigned int mult7[] = {0, 7, 14, 21, ...};
unsigned int three = 3;
unsigned int twenty_one = mult7[three];
This may sound silly (and it probably is for this specific case) but it's often handy for things where there is a real cost to calculation. I'm just not certain that multiplying by seven counts as one of those cases.
For a start, multiplying x
by 7 (or shifting x
three bits left then subtracting x
) is an operation that can be done entirely inside the CPU. With a table lookup, you might see a multiply-by-four (shift two bits left) followed by an add to get the right address, but then you have to access memory to do the actual lookup - even with caching and all the other wondrous tricks current CPUs are capable of, that's probably going to slow things down.
There's also a good chance that your compiler will already know all the tricks about how to multiply fast. If your seven is a constant (or const int
or equivalent), the compiler will probably already have chosen the fastest way and there's a good chance the compiler writers know a lot more about this sort of stuff than mere mortals :-) (a)
But for cases where the calculation cost is relatively high, computing the values once and embedding them in your code as a lookup table is one of the standard optimisation strategies (trade off time for space).
(a) Examine the following code:
#include <stdio.h>
static int mult7 (int num) {
return num * 7;
}
int main (int argc, char *argv[]) {
printf ("%d\n", mult7 (atoi (argv[1])));
return 0;
}
With normal compilation by gcc
, mult7
comes out as the shift left three and subtract trick:
_mult7:
pushl %ebp ; stack frame setup.
movl %esp, %ebp
movl 8(%ebp), %edx ; get value to edx
movl %edx, %eax ; and eax.
sall $3, %eax ; eax <- eax * 8.
subl %edx, %eax ; eax <- eax - edx.
popl %ebp ; stack frame teardown and return.
ret
At -O3
(what I like to call the insane optimisation level), the whole thing is inlined into main
with:
call _atoi
movl $LC0, (%esp)
leal 0(,%eax,8), %edx ; these two are the relevant instructions.
subl %eax, %edx
movl %edx, 4(%esp)
call _printf
Note that this inlining action is only possible due to the static nature of the function - if it were visible to the linker, it would have to be maintained as a separate function in case another object file needed to call it.
If you take off the static
, it does indeed keep it non-inlined with all the stack frame setup and teardown but it at least still uses the (presumably) more efficient trick mentioned below. You can get rid of the stack frame code in gcc
if you use -fomit-frame-pointer
provided this doesn't adversely affect the code but this is starting to delve into the dark side a little :-)
This trick is to use the LEA
instruction to set edx
to eax * 8
then subtracts eax
from that. Same theory as the sall/subl
at normal optimisation, just slightly different mechanics.
Bottom line, trust your compiler. If you want to multiply num
by 7, use the following:
num *= 7;
The chances are that whatever improvement you get from such an attempted micro-optimisation, you could get a far better improvement by looking at the macro level (algorithm and data structure selection and so forth).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With