Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Visual C++ consider signed integer overflow undefined?

It's gotten a lot of attention lately that signed integer overflow is officially undefined in C and C++. However, a given implementation may choose to define it; in C++, an implementation may set std::numeric_limits<signed T>::is_modulo to true to indicate that signed integer overflow is well-defined for that type, and wraps like unsigned integers do.

Visual C++ sets std::numeric_limits<signed int>::is_modulo to true. This has hardly been a reliable indicator, since GCC set this to true for years and has undefined signed overflow. I have never encountered a case in which Visual C++'s optimizer has done anything but give wraparound behavior to signed integers - until earlier this week.

I found a case in which the optimizer emitted x86-64 assembly code that acted improperly if the value of exactly INT_MAX was passed to a particular function. I can't tell whether it's a bug, because Visual C++ doesn't seem to state whether signed integer overflow is considered defined. So I'm wondering, is it supposed to be defined in Visual C++?

EDIT: I found this when reading about a nasty bug in Visual C++ 2013 Update 2 that wasn't in Update 1, where the following loop generates bad machine code if optimizations are enabled:

void func (int *b, int n)
{
  for (int i = 0; i < n; i++)
    b[i * (n + 1)] = 1;
}

That Update 2 bug results in the repeated line having its code generated as if it were b[i] = 1;, which is clearly wrong. It turned into rep stosd.

What was really interesting was that there was weirdness in the previous version, Update 1. It generated code that didn't properly handle the case that n exactly equaled INT_MAX. Specifically, if n were INT_MAX, the multiplication would act as if n were long long instead of int - in other words, the addition n + 1 would not cause the result to become INT_MIN as it should.

This was the assembly code in Update 1:

    movsxd  rax, edx          ; RDX = 0x000000007FFFFFFF; RAX = 0x000000007FFFFFFF.
    test    edx, edx
    jle     short locret_76   ; Branch not taken, because EDX is nonnegative.
    lea     rdx, ds:4[rax*4]  ; RDX = RAX * 4 + 4; RDX becomes 0x0000000200000000.
    nop                       ; But it's wrong. RDX should now be 0xFFFFFFFE00000000.
loc_68:
    mov     dword ptr [rcx], 1
    add     rcx, rdx
    dec     rax
    jnz     short loc_68
locret_76:
    retn

The issue is that I don't know whether this is a compiler bug - in GCC and Clang, this wouldn't be a compiler bug, because those compilers consider signed integer overflow/underflow to be undefined. Whether this is a bug in Visual C++ depends on whether Visual C++ considers signed integer overflow/underflow to be undefined.

Every other case I've seen besides this one has shown Visual C++ to consider signed overflow/underflow to be defined, hence the mystery.

like image 590
Myria Avatar asked Jun 29 '14 05:06

Myria


People also ask

Is signed integer overflow undefined?

12.2. 1 Basics of Integer Overflow In contrast, the C standard says that signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer. The misbehavior can even precede the overflow.

Does C detect overflow?

Detecting Overflow and Underflow in CIf both numbers are positive and the sum is negative, that means there is an overflow, so we return -1 else; if both numbers are negative and the sum is positive, that also means there is an overflow, so we return -1 else, no overflow.

Is unsigned overflow undefined behavior?

-fsanitize=unsigned-integer-overflow : Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional.

Is unsigned overflow defined?

Many unsigned integer overflows in C and C++ are well- defined, but non-portable. For example 0U-1 is well-defined and evaluates to UINT_MAX, but the actual value of that constant is implementation defined: it can be relied upon, but only within the context of a particular compiler and platform.


2 Answers

Found an interesting tidbit from back 2016 (VS2015 Update 3):

They talk about the new SSA optimizer they want to introduce into VS2015:

C++ Team Blog - Introducing a new, advanced Visual C++ code optimizer

... ... ...

Historically, Visual C++ did not take advantage of the fact that the C and C++ standards consider the result of overflowing signed operations undefined. Other compilers are very aggressive in this regard, which motivated the decision to implement some patterns which take advantage of undefined integer overflow behavior. We implemented the ones we thought were safe and didn’t impose any unnecessary security risks in generated code.

So there you have it. I read that as: "we never programmed in any extra bits to make use of this UB", but starting from VS2015/Update3 we will have some.

I should note that even before that I'd be extremely wary, because for 64 bit code and 32bit variables, if the compiler/optimizer simply puts the 32bit signed int into a 64bit register, you'll have undefined no matter what. (As shown in "How not to code: Undefined behavior is closer than you think" - unfortunately, it's unclear from the blog post whether he used VS2015 pre or post Update3.)

So my take on this whole affair is that MSVC always considered it UB, even though past optimizer version did not take special advantage of the fact. The new SAA optimizer seems to do for sure. (would be interesting to test if the –d2UndefIntOverflow– switch does it's job.)

like image 136
Martin Ba Avatar answered Oct 21 '22 23:10

Martin Ba


Your example probably does have undefined behavior for n == INT_MAX, but not just because of signed integer overflow being undefined (which it may not be on the Microsoft compiler). Rather, you are probably invoking undefined out-of-bounds pointer arithmetic.

like image 21
Demi Avatar answered Oct 22 '22 00:10

Demi