Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization of subsequent calls to integer division and modulo (remainder)

Integer division / and modulo % operations are often used together in programming, sometimes even on the same operands and in subsequent lines. For example, the following C function, which is a simple function that sums the result of a / of 2 numbers with the result of their %, does just that:

int sum2digits(int x, int base) {
    int n, m;
    n = x / base;
    m = x % base;
    return n + m;
}

As I know, both / and % are performed by the same machine instruction (in x86). Say, if you perform a machine instruction for integer division (div or idiv) of two numbers, a and b, then afterwards the value of a / b will be stored in the register EAX and the remainder a % b in EDX.
I wondered whether the compiler takes advantage of this quality and took a look at the assembly code. It turns out that normal compilation with gcc doesn't optimize this:

push   %rbp
mov    %rsp,%rbp
mov    %edi,-0x14(%rbp)
mov    %esi,-0x18(%rbp)
mov    -0x14(%rbp),%eax
mov    %eax,%edx
sar    $0x1f,%edx
idivl  -0x18(%rbp)
mov    %eax,-0x8(%rbp)
mov    -0x14(%rbp),%eax
mov    %eax,%edx
sar    $0x1f,%edx
idivl  -0x18(%rbp)
mov    %edx,-0x4(%rbp)
mov    -0x4(%rbp),%eax
mov    -0x8(%rbp),%edx
add    %edx,%eax
pop    %rbp
retq   

This assembly code does 2 subsequent calls to idivl, but each time reads the result from another register (EAX for quotient, EDX for remainder). However, compiling with the -O changes the picture:

mov    %edi,%eax
mov    %edi,%edx
sar    $0x1f,%edx
idiv   %esi
add    %edx,%eax
retq  

This code calls idiv only once, and uses its value for both computations.
Why isn't this kind of optimization a default? What is the use of calling div twice in a row? Can this optimization change the behaviour of a program in any way?
Also, and perhaps even more important, is there a way, as a programmer, to manually extract these 2 values (quotient and remainder) guaranteeing that only 1 integer division is performed by the CPU?

like image 351
ehudt Avatar asked Apr 09 '13 21:04

ehudt


2 Answers

Why isn't this kind of optimization a default?

If compilers and optimizers were perfect and debuggers could reverse-engineer code, optimization would be a universal default. But compilers don't always generate correct code, optimizers don't always preserve semantics, and debuggers can't always figure out what part(s) of a optimized program any given instruction relates to. It looks like Your compiler was installed with default options preset for absolute safety and debugging simplicity.

is there a way to manually extract these 2 values (quotient and remainder) guaranteeing that only 1 integer division is performed by the CPU?

These days the best way is exactly what you did: ask the compiler for optimized code. The div routine is a holdover from days when results from the division operators were implementation defined for negative numbers and optimizing compiles were so painfully slow that identifying things like this was better done manually.

like image 151
jthill Avatar answered Nov 14 '22 23:11

jthill


You could always implement your own division:

#include <stdlib.h>
#include <stdio.h>

void mydiv(int dividend, int divisor, int* quotient, int* remainder)
{
  *quotient = dividend / divisor;
  *remainder = dividend - *quotient * divisor;
}

int testData[][2] =
{
  { +5, +3 },
  { +5, -3 },
  { -5, +3 },
  { -5, -3 },
};

int main(void)
{
  unsigned i;
  for (i = 0; i < sizeof(testData)/sizeof(testData[0]); i++)
  {
    div_t res1, res2;
    res1 = div(testData[i][0], testData[i][1]);
    mydiv(testData[i][0], testData[i][1], &res2.quot, &res2.rem);
    printf("%+d/%+d = %+d:%+d %c= %+d:%+d\n",
           testData[i][0], testData[i][1],
           res1.quot, res1.rem,
           "!="[res1.quot == res2.quot && res1.rem == res2.rem],
           res2.quot, res2.rem);
  }
  return 0;
}

Output (ideone):

+5/+3 = +1:+2 == +1:+2
+5/-3 = -1:+2 == -1:+2
-5/+3 = -1:-2 == -1:-2
-5/-3 = +1:-2 == +1:-2

This does have one division. However, it looks like gcc isn't smart enough to eliminate the multiplication and so you have one of each.

like image 41
Alexey Frunze Avatar answered Nov 14 '22 23:11

Alexey Frunze