When compiling the following code:
int f(int i1,int i2)
{
long l1=i1;
long l2=i2;
return l1*l2;
}
with clang 10.1
for x86-64
and with -O3
, I get
mov eax, edi
imul eax, esi
ret
The compiler recognizes, that a full 64 bit operation is not needed.
However, when I replace the multiplication by a division:
int f(int i1,int i2)
{
long l1=i1;
long l2=i2;
return l1/l2;
}
it compiles into
movsx rax, edi
movsx rsi, esi
cqo
idiv rsi
ret
So it uses a 64-bit division (same for gcc).
What is the counter-example which prevents to use a 32-bit division here?
Consider what happens when i1 == INT_MIN == -2147483648
and i2 == -1
.
For comparison, let's also consider
int g(int i1, int i2) {
return i1/i2;
}
which compiles to a simple 32-bit idiv
.
If you call g(INT_MIN, -1)
, the division will overflow, since the result 2147483648
doesn't fit in int
. This results in undefined behavior at the level of C, and in fact the idiv
instruction will generate an exception.
If you instead call f(INT_MIN, -1)
, the division does not overflow, because the result does indeed fit in a long
. Now the return
statement causes it to be converted to int
by the usual integer conversions. Since the value doesn't fit in the signed type int
, the result is implementation-defined, and gcc documents what it will do:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
So it needs to generate code that ensures that an exception is not generated, and that the value returned is -2147483648
(which is equivalent to 2147483648
mod 2^32). A 32-bit divide won't do it, but a 64-bit divide will.
Interestingly, icc handles this by special-casing i2 == -1
and doing a 64-bit divide only in that case, and a 32-bit divide otherwise. This might be reasonable as it seems that 64-bit IDIV is potentially several times more expensive than 32-bit, glancing at Agner Fog's instruction tables. Though you might wish it would use NEG instead of dividing in that case (and if you're wondering, yes, NEG of INT_MIN
is INT_MIN
as desired).
(Actually, the special casing of -1
by icc was the hint that helped me realize the counterexample.)
No such special handling is needed for multiplication, because the behavior of imul
on overflow is already what is needed for conversion: it truncates to 32 bits with no exception.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With