The other day I upgraded my Windows build environment from MSVC2013 to MSVC2017, and lo and behold, a function in my program that had been working fine for years (and still works fine under g++/clang) suddenly started giving incorrect results when compiled with MSVC2017.
I was able to rewrite the function to give correct results again, but the experience made me curious -- was my function invoking undefined behavior (that just happened to give correct results until now), or was the code well-defined and MSVC2017 was being buggy?
Below is a trivial program showing a toy version of the function both before and after I rewrote it. In particular, does the function maybe_invokes_undefined_behavior(), as shown below, invoke undefined behavior, when called with an argument of value -32762?
#include <stdio.h>
enum {ciFirstToken = -32768};
// This function sometimes gives unexpected results under MSVC2017
void maybe_invokes_undefined_behavior(short token)
{
if (token >= 0) return;
token -= ciFirstToken; // does this invoke undefined behavior if (token==-32762) and (ciFirstToken==-32768)?
if (token == 6)
{
printf("Token is 6, as expected (unexpected behavior not reproduced)\n");
}
else
{
printf("Token should now be 6, but it's actually %i\n", (int) token); // under MSVC2017 this prints -65530 !?
}
}
// This function is rewritten to use int-math instead of short-math and always gives the expected result
void allgood(short token16)
{
if (token16 >= 0) return;
int token = token16;
token -= ciFirstToken;
if (token == 6)
{
printf("Token is 6, as expected (odd behavior not reproduced)\n");
}
else
{
printf("Token should now be 6, but it's actually %i\n", (int) token);
}
}
int main(int, char **)
{
maybe_invokes_undefined_behavior(-32762);
allgood(-32762);
return 0;
}
So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.
It exists because of the syntax rules of C where a variable can be declared without init value. Some compilers assign 0 to such variables and some just assign a mem pointer to the variable and leave just like that. if program does not initialize these variables it leads to undefined behavior.
Undefined behavior can lead to security vulnerabilities in software. For example, buffer overflows and other security vulnerabilities in the major web browsers are due to undefined behavior. The Year 2038 problem is another example due to signed integer overflow.
does this invoke undefined behavior if (token==-32762) and (ciFirstToken==-32768)?
token -= ciFirstToken;
NO (for the short answer)
Now let's break this down piece by piece.
1) As per expr.ass for compound assignment, -=
:
The behavior of an expression of the form
E1
op=E2
is equivalent toE1 = E1 op E2
except thatE1
is evaluated only once.
the expression:
token -= ciFirstToken;
is equivalent to:
token = token - ciFirstToken;
// ^ binary (not unary)
2) The additive operator (-
) performs usual arithmetic conversion for operands of arithmetic type.
As per expr.arith.conv/1
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
(1.5) Otherwise, the integral promotions shall be performed on both operands.
3) Both operands are then promoted to an int
.
As per conv.prom/1:
A prvalue of an integer type other than
bool
,char16_
t,char32_t
, or wchar_t whose integer conversion rank is less than the rank ofint
can be converted to a prvalue of typeint
ifint
can represent all the values of the source type;
4) After the integer promotion, no further conversion is required.
As per expr.arith.conv/1.5.1
If both operands have the same type, no further conversion is needed.
5) Finally, Undefined behavior for expressions is defined as per expr.pre:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined
CONCLUSION
So now substituting the values:
token = -32762 - (-32768);
After all the integer promotions, both operands fall within the valid range of INT_MIN[1] and INT_MAX[2].
And after evaluation, the mathematical result (6) is then implicitly converted to short
, which is within the valid range of short
.
Thus, expression is well-formed.
Many thanks to @MSalters, @n.m and @Arne Vogel for helping with this answer.
Visual Studio 2015 MSVC14 Integer Limits and MS Integer Limits defines:
[1] INT_MIN -2147483648
[2] INT_MAX +2147483647
SHRT_MIN –32768
SHRT_MAX +32767
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With