I have an assignment of expaining some seemingly strange behaviors of C code (running on x86). I can easily complete everything else but this one has really confused me. <blockquote> Code snippet 1 outputs <code>-2147483648</code> <pre class="prettyprint"><code>int a = 0x80000000; int b = a / -1; printf("%d\n", b); </code></pre> </blockquote> <blockquote> Code snippet 2 outputs nothing, and gives a <code>Floating point exception</code> <pre class="prettyprint"><code>int a = 0x80000000; int b = -1; int c = a / b; printf("%d\n", c); </code></pre> </blockquote> I well know the reason for the result of Code Snippet 1 (<code>1 + ~INT_MIN == INT_MIN</code>), but I can't quite understand how can integer division by -1 generate FPE, nor can I reproduce it on my Android phone (AArch64, GCC 7.2.0). Code 2 just output the same as Code 1 without any exceptions. Is it a hidden <s>bug</s> feature of x86 processor? The assignment didn't tell anything else (including CPU architecture), but since the whole course is based on a desktop Linux distro, you can safely assume it's a modern x86. <hr> Edit: I contacted my friend and he tested the code on Ubuntu 16.04 (Intel Kaby Lake, GCC 6.3.0). The result was consistent with whatever the assignment stated (Code 1 output the said thing and Code 2 crashed with FPE).

There are four things going on here: <ul> <li> <code>gcc -O0</code> behaviour explains the difference between your two versions: <code>idiv</code> vs. <code>neg</code>. (While <code>clang -O0</code> happens to compile them both with <code>idiv</code>). And why you get this even with compile-time-constant operands. </li> <li> x86 <code>idiv</code> faulting behaviour vs. behaviour of the division instruction on ARM </li> <li> If integer math results in a signal being delivered, POSIX require it to be SIGFPE: On which platforms does integer divide by zero trigger a floating point exception? But POSIX doesn't require trapping for any particular integer operation. (This is why it's allowed for x86 and ARM to be different). The Single Unix Specification defines SIGFPE as "Erroneous arithmetic operation". It's confusingly named after floating point, but in a normal system with the FPU in its default state, only integer math will raise it. On x86, only integer division. On MIPS, a compiler could use <code>add</code> instead of <code>addu</code> for signed math, so you could get traps on signed add overflow. (gcc uses <code>addu</code> even for signed, but an undefined-behaviour detector might use <code>add</code>.) </li> <li> C Undefined Behaviour rules (signed overflow, and division specifically) which let gcc emit code which can trap in that case. </li> </ul> <hr> gcc with no options is the same as <code>gcc -O0</code>. <blockquote> <code>-O0</code> Reduce compilation time and make debugging produce the expected results. This is the default. </blockquote> This explains the difference between your two versions: Not only does <code>gcc -O0</code> not try to optimize, it actively de-optimizes to make asm that independently implements each C statement within a function. This allows <code>gdb</code>'s <code>jump</code> command to work safely, letting you jump to a different line within the function and act like you're really jumping around in the C source. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? explains more about how and why <code>-O0</code> compiles the way it does. It also can't assume anything about variable values between statements, because you can change variables with <code>set b = 4</code>. This is obviously catastrophically bad for performance, which is why <code>-O0</code> code runs several times slower than normal code, and why optimizing for <code>-O0</code> specifically is total nonsense. It also makes <code>-O0</code> asm output really noisy and hard for a human to read, because of all the storing/reloading, and lack of even the most obvious optimizations. <pre class="prettyprint"><code>int a = 0x80000000; int b = -1; // debugger can stop here on a breakpoint and modify b. int c = a / b; // a and b have to be treated as runtime variables, not constants. printf("%d\n", c); </code></pre> I put your code inside functions on the Godbolt compiler explorer to get the asm for those statements. To evaluate <code>a/b</code>, <code>gcc -O0</code> has to emit code to reload <code>a</code> and <code>b</code> from memory, and not make any assumptions about their value. But with <code>int c = a / -1;</code>, you can't change the <code>-1</code> with a debugger, so gcc can and does implement that statement the same way it would implement <code>int c = -a;</code>, with an x86 <code>neg eax</code> or AArch64 <code>neg w0, w0</code> instruction, surrounded by a load(a)/store(c). On ARM32, it's a <code>rsb r3, r3, #0</code> (reverse-subtract: <code>r3 = 0 - r3</code>). However, clang5.0 <code>-O0</code> doesn't do that optimization. It still uses <code>idiv</code> for <code>a / -1</code>, so both versions will fault on x86 with clang. Why does gcc "optimize" at all? See Disable all optimization options in GCC. gcc always transforms through an internal representation, and -O0 is just the minimum amount of work needed to produce a binary. It doesn't have a "dumb and literal" mode that tries to make the asm as much like the source as possible. <hr> <h3>x86 <code>idiv</code> vs. AArch64 <code>sdiv</code>:</h3> x86-64: <pre class="prettyprint"><code> # int c = a / b from x86_fault() mov eax, DWORD PTR [rbp-4] cdq # dividend sign-extended into edx:eax idiv DWORD PTR [rbp-8] # divisor from memory mov DWORD PTR [rbp-12], eax # store quotient </code></pre> Unlike <code>imul r32,r32</code>, there's no 2-operand <code>idiv</code> that doesn't have a dividend upper-half input. Anyway, not that it matters; gcc is only using it with <code>edx</code> = copies of the sign bit in <code>eax</code>, so it's really doing a 32b / 32b => 32b quotient + remainder. As documented in Intel's manual, <code>idiv</code> raises #DE on: <ul> <li>divisor = 0</li> <li>The signed result (quotient) is too large for the destination.</li> </ul> Overflow can easily happen if you use the full range of divisors, e.g. for <code>int result = long long / int</code> with a single 64b / 32b => 32b division. But gcc can't do that optimization because it's not allowed to make code that would fault instead of following the C integer promotion rules and doing a 64-bit division and then truncating to <code>int</code>. It also doesn't optimize even in cases where the divisor is known to be large enough that it couldn't <code>#DE</code> When doing 32b / 32b division (with <code>cdq</code>), the only input that can overflow is <code>INT_MIN / -1</code>. The "correct" quotient is a 33-bit signed integer, i.e. positive <code>0x80000000</code> with a leading-zero sign bit to make it a positive 2's complement signed integer. Since this doesn't fit in <code>eax</code>, <code>idiv</code> raises a <code>#DE</code> exception. The kernel then delivers <code>SIGFPE</code>. AArch64: <pre class="prettyprint"><code> # int c = a / b from x86_fault() (which doesn't fault on AArch64) ldr w1, [sp, 12] ldr w0, [sp, 8] # 32-bit loads into 32-bit registers sdiv w0, w1, w0 # 32 / 32 => 32 bit signed division str w0, [sp, 4] </code></pre> ARM hardware division instructions don't raise exceptions for divide by zero or for <code>INT_MIN/-1</code> overflow. Nate Eldredge commented: <blockquote> The full ARM architecture reference manual states that UDIV or SDIV, when dividing by zero, simply return zero as the result, "without any indication that the division by zero occurred" (C3.4.8 in the Armv8-A version). No exceptions and no flags - if you want to catch divide by zero, you have to write an explicit test. Likewise, signed divide of <code>INT_MIN</code> by <code>-1</code> returns <code>INT_MIN</code> with no indication of the overflow. </blockquote> AArch64 <code>sdiv</code> documentation doesn't mention any exceptions. However, software implementations of integer division may raise: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html. (gcc uses a library call for division on ARM32 by default, unless you set a -mcpu that has HW division.) <hr> <h3>C Undefined Behaviour.</h3> As PSkocik explains, <code>INT_MIN</code> / <code>-1</code> is undefined behaviour in C, like all signed integer overflow. This allows compilers to use hardware division instructions on machines like x86 without checking for that special case. If it had to not fault, unknown inputs would require run-time compare-and branch checks, and nobody wants C to require that. <hr> More about the consequences of UB: With optimization enabled, the compiler can assume that <code>a</code> and <code>b</code> still have their set values when <code>a/b</code> runs. It can then see the program has undefined behaviour, and thus can do whatever it wants. gcc chooses to produce <code>INT_MIN</code> like it would from <code>-INT_MIN</code>. On a 2's complement system, the most-negative number is its own negative. This is a nasty corner-case for 2's complement, because it means <code>abs(x)</code> can still be negative. https://en.wikipedia.org/wiki/Two%27s_complement#Most_negative_number <pre class="prettyprint"><code>int x86_fault() { int a = 0x80000000; int b = -1; int c = a / b; return c; } </code></pre> compile to this with <code>gcc6.3 -O3</code> for x86-64 <pre class="prettyprint"><code>x86_fault: mov eax, -2147483648 ret </code></pre> but <code>clang5.0 -O3</code> compiles to (with no warning even with -Wall -Wextra`): <pre class="prettyprint"><code>x86_fault: ret </code></pre> Undefined Behaviour really is totally undefined. Compilers can do whatever they feel like, including returning whatever garbage was in <code>eax</code> on function entry, or loading a NULL pointer and an illegal instruction. e.g. with gcc6.3 -O3 for x86-64: <pre class="prettyprint"><code>int *local_address(int a) { return &a; } local_address: xor eax, eax # return 0 ret void foo() { int *p = local_address(4); *p = 2; } foo: mov DWORD PTR ds:0, 0 # store immediate 0 into absolute address 0 ud2 # illegal instruction </code></pre> Your case with <code>-O0</code> didn't let the compilers see the UB at compile time, so you got the "expected" asm output. See also What Every C Programmer Should Know About Undefined Behavior (the same LLVM blog post that Basile linked).

Signed <code>int</code> division in two's complement is undefined if: <ol> <li>the divisor is zero, OR</li> <li>the dividend is <code>INT_MIN</code> (==<code>0x80000000</code> if <code>int</code> is <code>int32_t</code>) and the divisor is <code>-1</code> (in two's complement, <code>-INT_MIN > INT_MAX</code>, which causes integer overflow, which is undefined behavior in C)</li> </ol> (https://www.securecoding.cert.org recommends wrapping integer operations in functions that check for such edge cases) Since you're invoking undefined behavior by breaking rule 2, anything can happen, and as it happens, this particular anything on your platform happens to be an FPE signal being generated by your processor.

Why does integer division by -1 (negative one) result in FPE?

Tags:

c

x86

gcc

arm64

floating-point-exceptions

I have an assignment of expaining some seemingly strange behaviors of C code (running on x86). I can easily complete everything else but this one has really confused me.

Code snippet 1 outputs -2147483648
int a = 0x80000000;
int b = a / -1;
printf("%d\n", b);

Code snippet 2 outputs nothing, and gives a Floating point exception
int a = 0x80000000;
int b = -1;
int c = a / b;
printf("%d\n", c);

I well know the reason for the result of Code Snippet 1 (1 + ~INT_MIN == INT_MIN), but I can't quite understand how can integer division by -1 generate FPE, nor can I reproduce it on my Android phone (AArch64, GCC 7.2.0). Code 2 just output the same as Code 1 without any exceptions. Is it a hidden ~~bug~~ feature of x86 processor?

The assignment didn't tell anything else (including CPU architecture), but since the whole course is based on a desktop Linux distro, you can safely assume it's a modern x86.

Edit: I contacted my friend and he tested the code on Ubuntu 16.04 (Intel Kaby Lake, GCC 6.3.0). The result was consistent with whatever the assignment stated (Code 1 output the said thing and Code 2 crashed with FPE).

200

asked Sep 23 '17 09:09

iBug

2 Answers

There are four things going on here:

gcc -O0 behaviour explains the difference between your two versions: idiv vs. neg. (While clang -O0 happens to compile them both with idiv). And why you get this even with compile-time-constant operands.
x86 idiv faulting behaviour vs. behaviour of the division instruction on ARM
If integer math results in a signal being delivered, POSIX require it to be SIGFPE: On which platforms does integer divide by zero trigger a floating point exception? But POSIX doesn't require trapping for any particular integer operation. (This is why it's allowed for x86 and ARM to be different).

The Single Unix Specification defines SIGFPE as "Erroneous arithmetic operation". It's confusingly named after floating point, but in a normal system with the FPU in its default state, only integer math will raise it. On x86, only integer division. On MIPS, a compiler could use add instead of addu for signed math, so you could get traps on signed add overflow. (gcc uses addu even for signed, but an undefined-behaviour detector might use add.)
C Undefined Behaviour rules (signed overflow, and division specifically) which let gcc emit code which can trap in that case.

gcc with no options is the same as gcc -O0.

-O0 Reduce compilation time and make debugging produce the expected results. This is the default.

This explains the difference between your two versions:

Not only does gcc -O0 not try to optimize, it actively de-optimizes to make asm that independently implements each C statement within a function. This allows gdb's jump command to work safely, letting you jump to a different line within the function and act like you're really jumping around in the C source. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? explains more about how and why -O0 compiles the way it does.

It also can't assume anything about variable values between statements, because you can change variables with set b = 4. This is obviously catastrophically bad for performance, which is why -O0 code runs several times slower than normal code, and why optimizing for -O0 specifically is total nonsense. It also makes -O0 asm output really noisy and hard for a human to read, because of all the storing/reloading, and lack of even the most obvious optimizations.

int a = 0x80000000;
int b = -1;
  // debugger can stop here on a breakpoint and modify b.
int c = a / b;        // a and b have to be treated as runtime variables, not constants.
printf("%d\n", c);

I put your code inside functions on the Godbolt compiler explorer to get the asm for those statements.

To evaluate a/b, gcc -O0 has to emit code to reload a and b from memory, and not make any assumptions about their value.

But with int c = a / -1;, you can't change the -1 with a debugger, so gcc can and does implement that statement the same way it would implement int c = -a;, with an x86 neg eax or AArch64 neg w0, w0 instruction, surrounded by a load(a)/store(c). On ARM32, it's a rsb r3, r3, #0 (reverse-subtract: r3 = 0 - r3).

However, clang5.0 -O0 doesn't do that optimization. It still uses idiv for a / -1, so both versions will fault on x86 with clang. Why does gcc "optimize" at all? See Disable all optimization options in GCC. gcc always transforms through an internal representation, and -O0 is just the minimum amount of work needed to produce a binary. It doesn't have a "dumb and literal" mode that tries to make the asm as much like the source as possible.

x86 `idiv` vs. AArch64 `sdiv`:

x86-64:

    # int c = a / b  from x86_fault()
    mov     eax, DWORD PTR [rbp-4]
    cdq                                 # dividend sign-extended into edx:eax
    idiv    DWORD PTR [rbp-8]           # divisor from memory
    mov     DWORD PTR [rbp-12], eax     # store quotient

Unlike imul r32,r32, there's no 2-operand idiv that doesn't have a dividend upper-half input. Anyway, not that it matters; gcc is only using it with edx = copies of the sign bit in eax, so it's really doing a 32b / 32b => 32b quotient + remainder. As documented in Intel's manual, idiv raises #DE on:

divisor = 0
The signed result (quotient) is too large for the destination.

Overflow can easily happen if you use the full range of divisors, e.g. for int result = long long / int with a single 64b / 32b => 32b division. But gcc can't do that optimization because it's not allowed to make code that would fault instead of following the C integer promotion rules and doing a 64-bit division and then truncating to int. It also doesn't optimize even in cases where the divisor is known to be large enough that it couldn't #DE

When doing 32b / 32b division (with cdq), the only input that can overflow is INT_MIN / -1. The "correct" quotient is a 33-bit signed integer, i.e. positive 0x80000000 with a leading-zero sign bit to make it a positive 2's complement signed integer. Since this doesn't fit in eax, idiv raises a #DE exception. The kernel then delivers SIGFPE.

AArch64:

    # int c = a / b  from x86_fault()  (which doesn't fault on AArch64)
    ldr     w1, [sp, 12]
    ldr     w0, [sp, 8]          # 32-bit loads into 32-bit registers
    sdiv    w0, w1, w0           # 32 / 32 => 32 bit signed division
    str     w0, [sp, 4]

ARM hardware division instructions don't raise exceptions for divide by zero or for INT_MIN/-1 overflow. Nate Eldredge commented:

The full ARM architecture reference manual states that UDIV or SDIV, when dividing by zero, simply return zero as the result, "without any indication that the division by zero occurred" (C3.4.8 in the Armv8-A version). No exceptions and no flags - if you want to catch divide by zero, you have to write an explicit test. Likewise, signed divide of INT_MIN by -1 returns INT_MIN with no indication of the overflow.

AArch64 sdiv documentation doesn't mention any exceptions.

However, software implementations of integer division may raise: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html. (gcc uses a library call for division on ARM32 by default, unless you set a -mcpu that has HW division.)

C Undefined Behaviour.

As PSkocik explains, INT_MIN / -1 is undefined behaviour in C, like all signed integer overflow. This allows compilers to use hardware division instructions on machines like x86 without checking for that special case. If it had to not fault, unknown inputs would require run-time compare-and branch checks, and nobody wants C to require that.

More about the consequences of UB:

With optimization enabled, the compiler can assume that a and b still have their set values when a/b runs. It can then see the program has undefined behaviour, and thus can do whatever it wants. gcc chooses to produce INT_MIN like it would from -INT_MIN.

On a 2's complement system, the most-negative number is its own negative. This is a nasty corner-case for 2's complement, because it means abs(x) can still be negative. https://en.wikipedia.org/wiki/Two%27s_complement#Most_negative_number

int x86_fault() {
    int a = 0x80000000;
    int b = -1;
    int c = a / b;
    return c;
}

compile to this with gcc6.3 -O3 for x86-64

x86_fault:
    mov     eax, -2147483648
    ret

but clang5.0 -O3 compiles to (with no warning even with -Wall -Wextra`):

x86_fault:
    ret

Undefined Behaviour really is totally undefined. Compilers can do whatever they feel like, including returning whatever garbage was in eax on function entry, or loading a NULL pointer and an illegal instruction. e.g. with gcc6.3 -O3 for x86-64:

int *local_address(int a) {
    return &a;
}

local_address:
    xor     eax, eax     # return 0
    ret

void foo() {
    int *p = local_address(4);
    *p = 2;
}

 foo:
   mov     DWORD PTR ds:0, 0     # store immediate 0 into absolute address 0
   ud2                           # illegal instruction

Your case with -O0 didn't let the compilers see the UB at compile time, so you got the "expected" asm output.

See also What Every C Programmer Should Know About Undefined Behavior (the same LLVM blog post that Basile linked).

answered Oct 28 '22 07:10

Peter Cordes

Signed int division in two's complement is undefined if:

the divisor is zero, OR
the dividend is INT_MIN (==0x80000000 if int is int32_t) and the divisor is -1 (in two's complement, -INT_MIN > INT_MAX, which causes integer overflow, which is undefined behavior in C)

(https://www.securecoding.cert.org recommends wrapping integer operations in functions that check for such edge cases)

Since you're invoking undefined behavior by breaking rule 2, anything can happen, and as it happens, this particular anything on your platform happens to be an FPE signal being generated by your processor.

answered Oct 28 '22 06:10

PSkocik

Related questions
                            
                                Objective-C Integer Arithmetic
                            
                                Detect when a Module (DLL) is unloaded
                            
                                Is it a good idea to put all of your includes in one header file?
                            
                                How to properly use memcpy?
                            
                                On Undefined Behavior
                            
                                Malloc function (dynamic memory allocation) resulting in an error when it is used globally
                            
                                Motivation for using size_t uint32 uint64 etc
                            
                                generating random number with a specific distribution in c
                            
                                GDB: Is there a command that allows you to see how many times a function has been called?
                            
                                executing a process with argc=0
                            
                                Compiling a basic OpenCV program on Mac OS X 10.7
                            
                                MinGW not defining WIN32, error in preprocessor directives
                            
                                Printing unsigned long long int Value Type Returns Strange Results
                            
                                Generating a uniform distribution of INTEGERS in C
                            
                                Are there any existing C implementations having padding bit in (un)signed integer representation?
                            
                                Compiling a C program that uses OpenGl in Mac OS X
                            
                                The program doesn't stop on scanf("%c", &ch) line, why? [duplicate]
                            
                                Can local variables and functions have the same names in C?
                            
                                why is reading blocks of data faster than reading byte by byte in file I/O
                            
                                Pointer Arithmetic in C [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does integer division by -1 (negative one) result in FPE?

Tags:

c

x86

gcc

arm64

floating-point-exceptions

iBug

People also ask

2 Answers

x86 `idiv` vs. AArch64 `sdiv`:

C Undefined Behaviour.

Peter Cordes

PSkocik

Recent Activity

Donate For Us

Why does integer division by -1 (negative one) result in FPE?

Tags:

c

x86

gcc

arm64

floating-point-exceptions

iBug

People also ask

2 Answers

x86 idiv vs. AArch64 sdiv:

C Undefined Behaviour.

Peter Cordes

PSkocik

Related questions

Recent Activity

Donate For Us

x86 `idiv` vs. AArch64 `sdiv`: