Assembly Language - How to do Modulo?

1 Answers

If your modulus / divisor is a known constant, and you care about performance, see this and this. A multiplicative inverse is even possible for loop-invariant values that aren't known until runtime, e.g. see https://libdivide.com/ (But without JIT code-gen, that's less efficient than hard-coding just the steps necessary for one constant.)

Never use div for known powers of 2: it's much slower than and for remainder, or right-shift for divide. Look at C compiler output for examples of unsigned or signed division by powers of 2, e.g. on the Godbolt compiler explorer. If you know a runtime input is a power of 2, use lea eax, [esi-1] ; and eax, edi or something like that to do x & (y-1). Modulo 256 is even more efficient: movzx eax, cl has zero latency on recent Intel CPUs (mov-elimination), as long as the two registers are separate.

In the simple/general case: unknown value at runtime

The DIV instruction (and its counterpart IDIV for signed numbers) gives both the quotient and remainder. For unsigned, remainder and modulus are the same thing. For signed idiv, it gives you the remainder (not modulus) which can be negative:
e.g. -5 / 2 = -2 rem -1. x86 division semantics exactly match C99's % operator.

DIV r32 divides a 64-bit number in EDX:EAX by a 32-bit operand (in any register or memory) and stores the quotient in EAX and the remainder in EDX. It faults on overflow of the quotient.

Unsigned 32-bit example (works in any mode)

mov eax, 1234          ; dividend low half mov edx, 0             ; dividend high half = 0.  prefer  xor edx,edx  mov ebx, 10            ; divisor can be any register or memory  div ebx       ; Divides 1234 by 10.         ; EDX =   4 = 1234 % 10  remainder         ; EAX = 123 = 1234 / 10  quotient

In 16-bit assembly you can do div bx to divide a 32-bit operand in DX:AX by BX. See Intel's Architectures Software Developer’s Manuals for more information.

Normally always use xor edx,edx before unsigned div to zero-extend EAX into EDX:EAX. This is how you do "normal" 32-bit / 32-bit => 32-bit division.

For signed division, use cdq before idiv to sign-extend EAX into EDX:EAX. See also Why should EDX be 0 before using the DIV instruction?. For other operand-sizes, use cbw (AL->AX), cwd (AX->DX:AX), cdq (EAX->EDX:EAX), or cqo (RAX->RDX:RAX) to set the top half to 0 or -1 according to the sign bit of the low half.

div / idiv are available in operand-sizes of 8, 16, 32, and (in 64-bit mode) 64-bit. 64-bit operand-size is much slower than 32-bit or smaller on current Intel CPUs, but AMD CPUs only care about the actual magnitude of the numbers, regardless of operand-size.

Note that 8-bit operand-size is special: the implicit inputs/outputs are in AH:AL (aka AX), not DL:AL. See 8086 assembly on DOSBox: Bug with idiv instruction? for an example.

Signed 64-bit division example (requires 64-bit mode)

   mov    rax,  0x8000000000000000   ; INT64_MIN = -9223372036854775808    mov    ecx,  10           ; implicit zero-extension is fine for positive numbers     cqo                       ; sign-extend into RDX, in this case = -1 = 0xFF...FF    idiv   rcx        ; quotient  = RAX = -922337203685477580 = 0xf333333333333334        ; remainder = RDX = -8                  = 0xfffffffffffffff8

Limitations / common mistakes

div dword 10 is not encodeable into machine code (so your assembler will report an error about invalid operands).

Unlike with mul/imul (where you should normally use faster 2-operand imul r32, r/m32 or 3-operand imul r32, r/m32, imm8/32 instead that don't waste time writing a high-half result), there is no newer opcode for division by an immediate, or 32-bit/32-bit => 32-bit division or remainder without the high-half dividend input.

Division is so slow and (hopefully) rare that they didn't bother to add a way to let you avoid EAX and EDX, or to use an immediate directly.

div and idiv will fault if the quotient doesn't fit into one register (AL / AX / EAX / RAX, the same width as the dividend). This includes division by zero, but will also happen with a non-zero EDX and a smaller divisor. This is why C compilers just zero-extend or sign-extend instead of splitting up a 32-bit value into DX:AX.

And also why INT_MIN / -1 is C undefined behaviour: it overflows the signed quotient on 2's complement systems like x86. See Why does integer division by -1 (negative one) result in FPE? for an example of x86 vs. ARM. x86 idiv does indeed fault in this case.

The x86 exception is #DE - divide exception. On Unix/Linux systems, the kernel delivers a SIGFPE arithmetic exception signal to processes that cause a #DE exception. (On which platforms does integer divide by zero trigger a floating point exception?)

For div, using a dividend with high_half < divisor is safe. e.g. 0x11:23 / 0x12 is less than 0xff so it fits in an 8-bit quotient.

Extended-precision division of a huge number by a small number can be implemented by using the remainder from one chunk as the high-half dividend (EDX) for the next chunk. This is probably why they chose remainder=EDX quotient=EAX instead of the other way around.

answered Oct 22 '22 03:10

user786653

Related questions
                            
                                Is it possible to include inline assembly in Go code?
                            
                                What is the difference between unconditional branch and unconditional jump (instructions in MIPS)?
                            
                                Is there syntax highlighting for assembly in Sublime Text 2?
                            
                                Where are expressions and constants stored if not in memory?
                            
                                NASM Vs GAS (Practical differences)
                            
                                Stack allocation, padding, and alignment
                            
                                Algorithm for finding the smallest power of two that's greater or equal to a given value [duplicate]
                            
                                What does `rep ret` mean?
                            
                                What registers are preserved through a linux x86-64 function call
                            
                                Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?
                            
                                Where to learn x64 assembly from? [closed]
                            
                                What does MOV EAX, DWORD PTR DS:[ESI] mean and what does it do?
                            
                                How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux
                            
                                "enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"
                            
                                What does the "rep stos" x86 assembly instruction sequence do?
                            
                                Why is gcc allowed to speculatively load from a struct?
                            
                                What do C and Assembler actually compile to? [closed]
                            
                                What is stack frame in assembly?
                            
                                What does ORG Assembly Instruction do?
                            
                                How to write self-modifying code in x86 assembly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Assembly Language - How to do Modulo?

Tags:

x86

assembly

modulo

integer-division

enne87

People also ask

1 Answers

In the simple/general case: unknown value at runtime

Limitations / common mistakes

user786653

Recent Activity

Donate For Us