How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

Tags:

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999; unsigned int j=i%10;

To my surprise I got

movl    -4(%ebp), %ecx movl    $-858993459, %edx movl    %ecx, %eax mull    %edx shrl    $3, %edx movl    %edx, %eax sall    $2, %eax addl    %edx, %eax addl    %eax, %eax movl    %ecx, %edx subl    %eax, %edx movl    %edx, %eax movl    %eax, -12(%ebp)

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

My question is how does this code work and how is it better than using the div operand.

998

asked Dec 05 '10 23:12

St0ner

2 Answers

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10 sall    $2, %eax     ; eax=(i/10)*4 addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5 addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

193

answered Oct 25 '22 12:10

Fabian Giesen

The first part, up to shrl $3, %edx, implements a fast integer division by 10. There are a few different algorithms that work when the number by which you divide is known in advance. Note that 858993459 is "0.2 * 2^32". The reason to do this is because, even though there is an integer division instruction div/idiv in the instruction set, it's typically very slow, several times slower than multiplication.

The second part calculates the remainder by multiplying the result of division by 10 (in an indirect way, via shifts and adds; presumably the compiler thinks that it will be faster that way) and then subtracting that from the original number.

answered Oct 25 '22 11:10

Eugene Smith

Related questions
                            
                                Installing GCC on Oracle Solaris 11
                            
                                How to compile C program without any optimization
                            
                                Why does compiling over 100,000 lines of std::vector::push_back take a long time?
                            
                                How do I enable C++ styled comments in gcc while leaving ANSI enabled?
                            
                                installing vmware tools: location of GCC binary? [closed]
                            
                                Visual C++ equivalent of __FILE__ , __LINE__ and __PRETTY_FUNCTION__
                            
                                gcc -Wshadow is too strict?
                            
                                Does GCC have a built-in compile time assert?
                            
                                Adjust Variable Tracking Assignment Length
                            
                                Is there a compiler bug exposed by my implementation of an is_complete type trait?
                            
                                Emulating GCC's __builtin_unreachable?
                            
                                Different Static Global Variables Share the Same Memory Address
                            
                                Link errors using <filesystem> members in C++17
                            
                                What does .p2align do in asm code?
                            
                                How to tell GCC that a pointer argument is always double-word-aligned?
                            
                                Recipe for compiling binutils & gcc together?
                            
                                Compiling with -static-libgcc -static-libstdc++ still results in dynamic dependency on libc.so
                            
                                How to do an explicit fall-through in C
                            
                                How Switch case Statement Implemented or works internally?
                            
                                ‘memcpy’ was not declared in this scope

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

Tags:

optimization

x86

gcc

assembly

St0ner

People also ask

2 Answers

Fabian Giesen

Eugene Smith

Recent Activity

Donate For Us