I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.
unsigned int i=999; unsigned int j=i%10;
To my surprise I got
movl -4(%ebp), %ecx movl $-858993459, %edx movl %ecx, %eax mull %edx shrl $3, %edx movl %edx, %eax sall $2, %eax addl %edx, %eax addl %eax, %eax movl %ecx, %edx subl %eax, %edx movl %edx, %eax movl %eax, -12(%ebp)
Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).
My question is how does this code work and how is it better than using the div operand.
3) modulus operator is not just applicable to integral types e.g. byte, short, int, long but also to floating-point types like float and double. 4) You can also use the remainder operator to check if a number is even or odd, or if a year is leap year.
The modulus operator, sometimes also called the remainder operator or integer remainder operator works on integers (and integer expressions) and yields the remainder when the first operand is divided by the second. In Python, the modulus operator is a percent sign ( % ). The syntax is the same as for other operators.
The modulus operator is added in the arithmetic operators in C, and it works between two available operands. It divides the given numerator by the denominator to find a result. In simpler words, it produces a remainder for the integer division. Thus, the remainder is also always an integer number only.
The modulo operator divides the value of operand1 by the value of operand2 and returns the remainder after the division. Both operands must be absolute.
Second question first: div
is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.
The first five instructions (up to and including the shrl
) compute i/10 (I'll explain how in a minute).
The next few instructions multiply the result by 10 again, but avoiding the mul
/imul
instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10 sall $2, %eax ; eax=(i/10)*4 addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5 addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
This is then subtracted from i
again to obtain i - (i/10)*10
which is i % 10
(for unsigned numbers).
Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx
, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.
There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)
Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div
. div
expects the 64-bit dividend in edx:eax
(high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx
divides edx:eax
by ebx
). It returns the quotient in eax
and the remainder in edx
. idiv
does the same for signed values.
The first part, up to shrl $3, %edx
, implements a fast integer division by 10. There are a few different algorithms that work when the number by which you divide is known in advance. Note that 858993459 is "0.2 * 2^32". The reason to do this is because, even though there is an integer division instruction div
/idiv
in the instruction set, it's typically very slow, several times slower than multiplication.
The second part calculates the remainder by multiplying the result of division by 10 (in an indirect way, via shifts and adds; presumably the compiler thinks that it will be faster that way) and then subtracting that from the original number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With