Question says it all. Does anyone know if the following...
size_t div(size_t value) { const size_t x = 64; return value / x; }
...is optimized into?
size_t div(size_t value) { return value >> 6; }
Do compilers do this? (My interest lies in GCC). Are there situations where it does and others where it doesn't?
I would really like to know, because every time I write a division that could be optimized like this I spend some mental energy wondering about whether precious nothings of a second is wasted doing a division where a shift would suffice.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
Even with g++ -O0
(yes, -O0
!), this happens. Your function compiles down to:
_Z3divm: .LFB952: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: movq %rdi, -24(%rbp) movq $64, -8(%rbp) movq -24(%rbp), %rax shrq $6, %rax leave ret
Note the shrq $6
, which is a right shift by 6 places.
With -O1
, the unnecessary junk is removed:
_Z3divm: .LFB1023: movq %rdi, %rax shrq $6, %rax ret
Results on g++ 4.3.3, x64.
Most compilers will go even further than reducing division by powers of 2 into shifts - they'll often convert integer division by a constant into a series of multiplication, shift, and addition instructions to get the result instead of using the CPU's built-in divide instruction (if there even is one).
For example, MSVC converts division by 71 to the following:
// volatile int y = x / 71; 8b 0c 24 mov ecx, DWORD PTR _x$[esp+8] ; load x into ecx b8 49 b4 c2 e6 mov eax, -423447479 ; magic happens starting here... f7 e9 imul ecx ; edx:eax = x * 0xe6c2b449 03 d1 add edx, ecx ; edx = x + edx c1 fa 06 sar edx, 6 ; edx >>= 6 (with sign fill) 8b c2 mov eax, edx ; eax = edx c1 e8 1f shr eax, 31 ; eax >>= 31 (no sign fill) 03 c2 add eax, edx ; eax += edx 89 04 24 mov DWORD PTR _y$[esp+8], eax
So, you get a divide by 71 with a multiply, a couple shifts and a couple adds.
For more details on what's going on, consult Henry Warren's "Hacker's Delight" book or the companion webpage:
There's an online added chapter that provides some addition information about about division by constants using multiplication/shift/add with magic numbers, and a page with a little JavaScript program that'll calculate the magic numbers you need.
The companion site for the book is well worth reading (as is the book) - particularly if you're interested in bit-level micro optimizations.
Another article that I discovered just now that discusses this optimization: http://blogs.msdn.com/devdev/archive/2005/12/12/502980.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With