I was learning about undefined behaviour and stumbled upon this code without any clear explanation: <pre class="prettyprint lang-c prettyprint-override"><code>#include <stdio.h> #include <limits.h> int foo ( int x) { printf ("% d\n" , x ); //2147483647 printf ("% d\n" , x+1 ); //-2147483648 overflow return ( x+1 ) > x ; // 1 but How???? } int main ( void ) { printf ("% d\n" , INT_MAX ); //2147483647 printf ("% d\n" , INT_MAX+1 ); //-2147483648 overflow printf ("% d\n" , ( INT_MAX+1 ) > INT_MAX ); //0 makes sense, since -ve < +ve printf ("% d\n" , foo(INT_MAX) ); //1 return 0; } </code></pre> When compiling on gcc, the compiler issues a warning: <blockquote> warning: integer overflow in expression of type 'int' results in '-2147483648' </blockquote> So, clearly the value of <code>INT_MAX+1</code> is negative, which explains why <code>(INT_MAX+1) > INT_MAX</code> evaluates to 0. But, why (or how) is <code>(x+1) > x</code> evaluating to 1 for <code>x = INT_MAX</code> in <code>foo(...)</code>?

When a program exhibits undefined behavior, the C standard makes no predictions regarding what the program will do. The program may crash, it may output strange results, or it may appear to work properly. In fact, compilers will often work under the assumption that a program does not contain undefined behavior. In the case of this expression: <pre class="prettyprint"><code>( x+1 ) > x </code></pre> Given that <code>x</code> has type <code>int</code>, the compiler knows that signed overflow is UB and works under the assumption that it will not occur. With that in mind, there is no value for <code>x</code> where this expression could be false, so the compiler could optimize away the expression and replace it with the value 1. When I run this program under gcc 4.8.5, I get the following results with <code>-O0</code> and <code>-O1</code>: <pre class="prettyprint lang-none prettyprint-override"><code> 2147483647 -2147483648 0 2147483647 -2147483648 0 </code></pre> And the following with <code>-O2</code> and <code>-O3</code>: <pre class="prettyprint lang-none prettyprint-override"><code> 2147483647 -2147483648 0 2147483647 -2147483648 1 </code></pre> Then looking at the assembly for <code>foo</code> in the later case: <pre class="prettyprint"><code>foo: .LFB11: .file 1 "x1.c" .loc 1 4 0 .cfi_startproc .LVL0: pushq %rbx // first call to printf .cfi_def_cfa_offset 16 .cfi_offset 3, -16 .loc 1 5 0 movl %edi, %esi .loc 1 4 0 movl %edi, %ebx .loc 1 5 0 xorl %eax, %eax movl $.LC0, %edi .LVL1: call printf .LVL2: .loc 1 6 0 // second call to printf leal 1(%rbx), %esi movl $.LC0, %edi xorl %eax, %eax call printf .LVL3: .loc 1 8 0 // return value movl $1, %eax popq %rbx .cfi_def_cfa_offset 8 .LVL4: ret .cfi_endproc </code></pre> We can see that's exactly what the compiler did: it optimized away the comparison and always returns 1. This illustrates how compilers can make use of undefined behavior to apply various optimizations.

When the Standard was written, compilers for conventional architectures would often perform integer arithmetic in wraparound two's-complement fashion, but there were times when doing something else might be more useful. As a couple of examples: <ol> <li> If a program was known not to deliberately cause integer overflows, having an implementation trap on overflow would be less bad than having it output that was superficially valid but wrong. </li> <li> Even on commonplace platforms, it was sometimes advantageous to perform arithmetic as though using a wider than specified type. For example, on the 8086, the multiply instruction would take two 16-bit operands and produce a 32-bit result, so when performing a computation like <code>int32a=int16a*int16b+int32b;</code>, keeping the 32-bit result of the multiplication would be cheaper than using a sign-extension instruction to promote the bottom 16 bits of the result to 32 bits. Additionally, that abstraction model would allow many kinds of expressions to be simplified, such as replacing <code>(x*30/15)</code> with <code>(x*2)</code>, or (as shown in the example), <code>x+y > x</code> with <code>y > 0</code>. </li> </ol> Rather than trying to guess at all the ways it might be useful for an implementation to handle integer overflow, or risk preventing implementations from treating integer overflow in whatever fashion their customers would find most useful, the Standard lets implementations choose whatever method they think most useful. The authors of gcc have decided that that the most useful way to process integer overflow is to use it to produce extended inferences that aren't bound by normal laws of causality. Consider, for example: <pre class="prettyprint"><code>unsigned arr[32771]; unsigned mul_mod_32768(unsigned short x, unsigned short y) { /* Note that the authors of the Standard specified that the multiply here happens as signed, because--according to the Rationale--they expected that commonplace implementations would process signed and unsigned math identically in cases like this! */ return (x * y) & 0x7FFFu; } void test(unsigned short n) { unsigned total=0; unsigned short s2=65535; for (unsigned short i=32768; i < n; i++) { total += mul_mod_32768(i, 65535); } if (n < 32770) arr[n] = total; } </code></pre> At optimization level 2 or 3, gcc will generate code for <code>test()</code> that is precisely equivalent to: <pre class="prettyprint"><code>void test(unsigned short n) { arr[n] = 0; } </code></pre> If n is 32768 or less, the loop won't run at all, total will be zero, and total will be stored into arr[n]. If n is 32769, the loop will run once, adding 0 to total, which will then be stored into arr[n]. If n is 32770 or greater, the Standard won't impose any requirements, so gcc will process those cases the same way as it processed the others, blindly storing zero into arr[n]. The Standard deliberately makes no attempt to forbid implementations which are specialized for particular narrow purposes from behaving in ways that would make them unsuitable for many others. The behavior of gcc here may be suitable for use with programs that will process data exclusively from trustworthy sources, but that doesn't imply that it should be viewed as suitable for anything else. Unfortunately, the language clang and gcc seek to process is very different from the language the C Standards Committee was chartered to describe.

How can (x+1) > x evaluate to both 0 and 1?

Tags:

c++

c

undefined-behavior

I was learning about undefined behaviour and stumbled upon this code without any clear explanation:

#include <stdio.h>
#include <limits.h>

int foo ( int x) {
    printf ("% d\n" ,  x );   //2147483647
    printf ("% d\n" ,  x+1 ); //-2147483648  overflow
    return ( x+1 ) > x ;      // 1 but How????
}

int main ( void ) {
    printf ("% d\n" ,  INT_MAX );     //2147483647
    printf ("% d\n" ,  INT_MAX+1 );   //-2147483648  overflow
    printf ("% d\n" , ( INT_MAX+1 ) > INT_MAX );  //0  makes sense, since -ve < +ve
    printf ("% d\n" ,  foo(INT_MAX) );  //1
    return 0;
}

When compiling on gcc, the compiler issues a warning:

warning: integer overflow in expression of type 'int' results in '-2147483648'

So, clearly the value of INT_MAX+1 is negative, which explains why (INT_MAX+1) > INT_MAX evaluates to 0.

But, why (or how) is (x+1) > x evaluating to 1 for x = INT_MAX in foo(...)?

737

asked May 28 '21 18:05

avm

2 Answers

When a program exhibits undefined behavior, the C standard makes no predictions regarding what the program will do. The program may crash, it may output strange results, or it may appear to work properly.

In fact, compilers will often work under the assumption that a program does not contain undefined behavior.

In the case of this expression:

( x+1 ) > x

Given that x has type int, the compiler knows that signed overflow is UB and works under the assumption that it will not occur. With that in mind, there is no value for x where this expression could be false, so the compiler could optimize away the expression and replace it with the value 1.

When I run this program under gcc 4.8.5, I get the following results with -O0 and -O1:

 2147483647
-2147483648
 0
 2147483647
-2147483648
 0

And the following with -O2 and -O3:

 2147483647
-2147483648
 0
 2147483647
-2147483648
 1

Then looking at the assembly for foo in the later case:

foo:
.LFB11:
    .file 1 "x1.c"
    .loc 1 4 0
    .cfi_startproc
.LVL0:
    pushq   %rbx                // first call to printf
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    .loc 1 5 0
    movl    %edi, %esi
    .loc 1 4 0
    movl    %edi, %ebx
    .loc 1 5 0
    xorl    %eax, %eax
    movl    $.LC0, %edi
.LVL1:
    call    printf
.LVL2:
    .loc 1 6 0                  // second call to printf
    leal    1(%rbx), %esi
    movl    $.LC0, %edi
    xorl    %eax, %eax
    call    printf
.LVL3:
    .loc 1 8 0                  // return value
    movl    $1, %eax
    popq    %rbx
    .cfi_def_cfa_offset 8
.LVL4:
    ret
    .cfi_endproc

We can see that's exactly what the compiler did: it optimized away the comparison and always returns 1.

This illustrates how compilers can make use of undefined behavior to apply various optimizations.

answered Oct 23 '22 18:10

dbush

When the Standard was written, compilers for conventional architectures would often perform integer arithmetic in wraparound two's-complement fashion, but there were times when doing something else might be more useful. As a couple of examples:

If a program was known not to deliberately cause integer overflows, having an implementation trap on overflow would be less bad than having it output that was superficially valid but wrong.
Even on commonplace platforms, it was sometimes advantageous to perform arithmetic as though using a wider than specified type. For example, on the 8086, the multiply instruction would take two 16-bit operands and produce a 32-bit result, so when performing a computation like int32a=int16a*int16b+int32b;, keeping the 32-bit result of the multiplication would be cheaper than using a sign-extension instruction to promote the bottom 16 bits of the result to 32 bits. Additionally, that abstraction model would allow many kinds of expressions to be simplified, such as replacing (x*30/15) with (x*2), or (as shown in the example), x+y > x with y > 0.

Rather than trying to guess at all the ways it might be useful for an implementation to handle integer overflow, or risk preventing implementations from treating integer overflow in whatever fashion their customers would find most useful, the Standard lets implementations choose whatever method they think most useful. The authors of gcc have decided that that the most useful way to process integer overflow is to use it to produce extended inferences that aren't bound by normal laws of causality.

Consider, for example:

unsigned arr[32771];
unsigned mul_mod_32768(unsigned short x, unsigned short y)
{
    /* Note that the authors of the Standard specified that the multiply
       here happens as signed, because--according to the Rationale--they
       expected that commonplace implementations would process signed and
       unsigned math identically in cases like this! */
    return (x * y) & 0x7FFFu;
}
void test(unsigned short n)
{
    unsigned total=0;        
    unsigned short s2=65535;
    for (unsigned short i=32768; i < n; i++)
    {
        total += mul_mod_32768(i, 65535);
    }
    if (n < 32770)
        arr[n] = total;
}

At optimization level 2 or 3, gcc will generate code for test() that is precisely equivalent to:

void test(unsigned short n)
{
    arr[n] = 0;
}

If n is 32768 or less, the loop won't run at all, total will be zero, and total will be stored into arr[n]. If n is 32769, the loop will run once, adding 0 to total, which will then be stored into arr[n]. If n is 32770 or greater, the Standard won't impose any requirements, so gcc will process those cases the same way as it processed the others, blindly storing zero into arr[n].

The Standard deliberately makes no attempt to forbid implementations which are specialized for particular narrow purposes from behaving in ways that would make them unsuitable for many others. The behavior of gcc here may be suitable for use with programs that will process data exclusively from trustworthy sources, but that doesn't imply that it should be viewed as suitable for anything else. Unfortunately, the language clang and gcc seek to process is very different from the language the C Standards Committee was chartered to describe.

answered Oct 23 '22 18:10

supercat

Related questions
                            
                                Is gcc wrongly evaluating std::declval in this concept definition?
                            
                                Pre vs Post Increment
                            
                                How to count the number of distinct values in a C++ std::map<Key,Values>
                            
                                Dereference a rvalue shared_ptr
                            
                                Extending namespace std to implement make_unique when using C++11
                            
                                Is there a maxheap in the C++ standard library?
                            
                                Default argument for a functor in a templated parameter
                            
                                Writing a portable SSE/AVX version of std::copysign
                            
                                Is there any way to create a function that takes as argument a member function or a member?
                            
                                Segmentation fault when using vectors in the class and constructor
                            
                                Object instantiation with curly braces and : symbols
                            
                                Difference between S() vs S{}?
                            
                                replace all odd values in vector with coresponing value from new vector
                            
                                Neat way how to cyclically iterate 4 enum class values in both directions in C++?
                            
                                C++20 support in Visual Studio
                            
                                Is calculating address difference undefined behaviour?
                            
                                Write a function that only accepts literal `0` or literal `1` as argument
                            
                                What is a pointer to array, int (*ptr)[10], and how does it work?
                            
                                Can I initialize a std::vector<bool> from uint8_t (or std::byte) range so that every bit in input is treated as a boolean?
                            
                                How to enable `/std:c++latest` in cmake?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With