Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is (int32_t) 255 << 24 undefined behavior in gcc (C++11)?

In C++11, according to en.cppreference.com,

For signed and non-negative a, the value of a << b is a * 2b if it is representable in the return type, otherwise the behavior is undefined.

My understanding is that, since 255 * 224 is not representable as an int32_t, the evaluation of (int32_t) 255 << 24 yields undefined behavior. Is that correct? Can this be compiler-dependent? It's an IP16 environment, if that matters.

Background: this comes from an argument I am having with a user at arduino.stackexchange.com. According to him, “there's nothing undefined about that at all”:

you notice that much of the bit shifting is "implementation defined". So you cannot quote chapter-and-verse from the specs. You have to go to the GCC documentation since that is the only place that can tell you what actually happens. gnu.org/software/gnu-c-manual/gnu-c-manual.html#Bit-Shifting - it's only "undefined" for a negative shift value.


Edit: From the answers so far, it would seem my reading of the C++11 standard is correct. Then the key part of my question is whether this expression invokes undefined behavior in gcc. As davmac puts it in his comment, I am asking “whether GCC, an implementation, defines a behaviour even though it is left undefined by the language standard”.

From the gcc manual I linked to, it would seem it is indeed defined, although I find the wording of this manual sounds more like a tutorial than a “language law”. From PSkocik's answer (and Kane's comment to that answer), it would instead seem it is undefined. So I am still in doubt.

I guess my dream would be to have a clear statement in some gcc documentation stating either that 1) gcc does not define any behavior that is explicitly undefined in the standard or, 2) gcc does define this behavior from version XX.XX and commits to keep it defined in all subsequent versions.

Edit 2: PSkocik deleted his answer, which I find unfortunate because it provided interesting information. From his answer, Kane's comment to the answer, and my own experiments:

  1. (int32_t)255<<24 produces a runtime error when compiled with clang and -fsanitize=undefined
  2. the same code produces no error with g++ even with -fsanitize=undefined
  3. (int32_t)256<<24 does give a runtime error when compiled with g++ -std=c++11 -fsanitize=undefined

Point 2 is consistent with the interpretation that gcc, in C++11 mode, defines the left shift more broadly than the standard. As per point 3, this definition could just be the C++14 definition. However, point 3 is inconsistent with the idea that the referenced manual is a complete definition of << in gcc (C++11 mode), as that manual provides no hint that (int32_t)256<<24 could be undefined.

like image 948
Edgar Bonet Avatar asked Dec 17 '18 10:12

Edgar Bonet


People also ask

What is undefined behavior c++?

So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

What is unsigned in c++?

The unsigned keyword is a data type specifier, that makes a variable only represent non-negative integer numbers (positive numbers and zero). It can be applied only to the char , short , int and long types.

Should I use unsigned int?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.


1 Answers

This changed over time, and with good reason, so let's go through the history. Note that in all cases, simply doing static_cast<int>(255u << 24) has always been defined behavior. Maybe just do that and side-step all problems.


The original C++11 wording was:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

255 << 24 is undefined behavior in C++11 because the resulting value is unrepresentable as a 32-bit signed integer, it is too large.

This undefined behavior causes some issues because constexpr must diagnose undefined behavior - and so some common approaches to setting values led to hard errors. Hence CWG 1457:

The current wording of 8.8 [expr.shift] paragraph 2 makes it undefined behavior to create the most-negative integer of a given type by left-shifting a (signed) 1 into the sign bit, even though this is not uncommonly done and works correctly on the majority of (twos-complement) architectures [...] As a result, this technique cannot be used in a constant expression, which will break a significant amount of code.

This was a defect applied against C++11. Technically, a conforming C++11 compiler would implement all of the defect reports, and so it would be correct to say that in C++11, this is not undefined behavior; the behavior for 255 << 24 in C++11 is defined to be -16777216.

The post-defect wording can be seen in C++14:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1×2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

There were no changes to the wording/behavior in C++17.

But for C++20, as a result of the Signed Integers are Two's Complement (and its wording paper), the wording is greatly simplified:

The value of E1 << E2 is the unique value congruent to E1×2E2 modulo 2N, where N is the range exponent of the type of the result.

255 << 24 still has defined behavior in C++20 (with the same resulting value), it's just that the specification for how we get there becomes a lot simpler because the language doesn't have to work around the fact that the representation for signed integers was implementation-defined.

like image 60
Barry Avatar answered Sep 22 '22 01:09

Barry