Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the GCC implementation of modulo (%) work, and why does it not use the div instruction?

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999; unsigned int j=i%10; 

To my surprise I got

movl    -4(%ebp), %ecx movl    $-858993459, %edx movl    %ecx, %eax mull    %edx shrl    $3, %edx movl    %edx, %eax sall    $2, %eax addl    %edx, %eax addl    %eax, %eax movl    %ecx, %edx subl    %eax, %edx movl    %edx, %eax movl    %eax, -12(%ebp) 

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

My question is how does this code work and how is it better than using the div operand.

like image 998
St0ner Avatar asked Dec 05 '10 23:12

St0ner


People also ask

What types does the mod (%) work on?

3) modulus operator is not just applicable to integral types e.g. byte, short, int, long but also to floating-point types like float and double. 4) You can also use the remainder operator to check if a number is even or odd, or if a year is leap year.

How does the modulo operation work?

The modulus operator, sometimes also called the remainder operator or integer remainder operator works on integers (and integer expressions) and yields the remainder when the first operand is divided by the second. In Python, the modulus operator is a percent sign ( % ). The syntax is the same as for other operators.

How does the modulo operator work in C?

The modulus operator is added in the arithmetic operators in C, and it works between two available operands. It divides the given numerator by the denominator to find a result. In simpler words, it produces a remainder for the integer division. Thus, the remainder is also always an integer number only.

How does mod work in assembly?

The modulo operator divides the value of operand1 by the value of operand2 and returns the remainder after the division. Both operands must be absolute.


2 Answers

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10 sall    $2, %eax     ; eax=(i/10)*4 addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5 addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10 

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.

like image 193
Fabian Giesen Avatar answered Oct 25 '22 12:10

Fabian Giesen


The first part, up to shrl $3, %edx, implements a fast integer division by 10. There are a few different algorithms that work when the number by which you divide is known in advance. Note that 858993459 is "0.2 * 2^32". The reason to do this is because, even though there is an integer division instruction div/idiv in the instruction set, it's typically very slow, several times slower than multiplication.

The second part calculates the remainder by multiplying the result of division by 10 (in an indirect way, via shifts and adds; presumably the compiler thinks that it will be faster that way) and then subtracting that from the original number.

like image 35
Eugene Smith Avatar answered Oct 25 '22 11:10

Eugene Smith