Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this treatment of int64_t a GCC AND Clang bug?

Now, some of you will be tempted to shout undefined behaviour, but there's a problem. The type int64_t is NOT defined by the C standard but by POSIX. POSIX defines this type as:

a signed integer type with width N, no padding bits, and a two's-complement representation.

It does not leave this for the implementation to define and most definitely does NOT allow it to be treated as an unbounded integer.

linux$ cat x.c
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

int stupid (int64_t a) {
  return (a+1) > a;
}

int main(void)
{
    int v;
    printf("%d\n", v = stupid(INT64_MAX));
    exit(v);
}

linux$ gcc -ox x.c -Wall && ./x
0
linux$ gcc -ox x.c -Wall -O2 && ./x # THIS IS THE ERROR.
1
linux$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

linux$ uname -a
Linux localhost 3.14.13-0-amd64 #1 SMP Sat Jul 26 20:03:23 BST 2014 x86_64 GNU/Linux
linux$ getconf LONG_BIT
32
linux$

Obviously, there's a problem here ... what is it?
Have I missed some sort of implicit cast ?

like image 993
user3710044 Avatar asked Apr 11 '15 13:04

user3710044


2 Answers

I'm still going to shout UNDEFINED BEHAVIOR.

The reasoning here is simple, the compiler assumes that you're a perfect programer and would never write any code that could possibly result in undefined behavior.

So when it sees your function :

int stupid (int64_t a) {
  return (a+1) > a;
}

It assumes that you're never going to call it with a==INT64_MAX, because that would be UB.

Therefore, this function can trivially be optimized to :

int stupid (int64_t a) {
  return 1;
}

Which can then be inlined as appropriate.

I suggest you read What Every C Programmer Should Know About Undefined Behavior for more explanations of how compilers exploit UB for optimizations purposes.

like image 128
tux3 Avatar answered Oct 24 '22 07:10

tux3


You don't need to go to POSIX to sort this out, ISO C controls this particular aspect in toto (references below are to the C11 standard).

The rest of this answer is going to go all "language lawyer" to show why it's undefined behaviour to add one to your signed value and therefore why both answers (true and false) are valid.


First, your contention that int64_t is not defined in ISO is not really correct. Section 7.20.1.1 Exact-width integer types states, when referring to intN_t types, that:

The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names.

That's why you don't need to worry about POSIX defining those types in a certain way, since ISO defines them exactly the same (two's complement, no padding, etc) assuming it has the appropriate capabilities.


So, now that we've established ISO does define them (if they're available in the implementation), let's now look at 6.5 Expressions /5:

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

Adding two identical integral types will definitely give you the same type (at least at the rank of int64_t, well above the point where integer promotions are done1), since this is dictated by the usual arithmetic conversions specified in 6.3.1.8. After the section dealing with various floating point types (of which int64_t is not), we see:

If both operands have the same type, then no further conversion is needed.

Earlier in that same section you find the statement that dictates the type of the result once a common type has been found:

Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result.

So, given that the result of INT64_MAX+1 won't actually fit into a int64_t variable, the behaviour is undefined.


Based on your comments that the encoding of int64_t indicates that adding one would wrap, you have to understand that doesn't change the clause that it's undefined at all. An implementation is still free to do whatever it wants in this case, even if that doesn't make sense based on what you think.

And, in any case, the expression INT64_MAX + 1 > INT64_MAX (where 1 undergoes the integer promotion to an int64_t) may well be simply compiled as 1 since that's arguably faster to do than actually incrementing a value and doing a comparison. And it is the correct result, since anything is the correct result :-)

In that sense, it's no different to an implementation converting:

int ub (int i) { return i++ * i++; } // big no-no
:
int x = ub (3);

into the simpler and almost certainly faster:

int x = 0;

You may contend that the answer would be better as 9 or 12 (depending on when the ++ side effects are performed) but, given undefined behaviour is a breaking of the contract between coder and compiler, the compiler is free to do whatever it wants.


In any case, if you want a well defined version of your function, you could simply opt for something like:

int stupid (int64_t a) {
    return (a == INT64_MAX) ? 0 : 1;
}

That gets you the desired/expected result without resorting to undefined behaviour :-)


1 There may be an edge case here if the width of int is actually greater than 64 bits. In that case, it may well be that the integer promotions will force int64_t to an int, allowing the expression to be well defined. I haven't looked into that in great detail so may be wrong (in other words, don't consider it a gospel part of my answer) but it's worth keeping in mind as something to check if you ever get an implementation with an int more than 64 bits wide.

like image 6
paxdiablo Avatar answered Oct 24 '22 07:10

paxdiablo