Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zero optimization in the compiler

Tags:

c

In a lecture in a computer structure course, my lecturer showed us the following code

#include <stdio.h>

int main() {

    char c = 125;
    while (c < (c + 1)) {
        printf("%d ", c);
        c++;
    }

    return 0;
}

she claimed that when the compiler optimizes the code will run in an infinite loop because of the overflow thing. But when I tested the code (in a Linux environment) and compiled it with the -0O flag, it still ran in an infinite loop. My friend claimed that it is possible that the c++ has overflow, but under the condition < c + 1 the compiler treats it as an int and therefore it will remain in an infinite loop even with no optimization.

I would appreciate it if someone has an answer for me, thanks!

like image 288
user28816434 Avatar asked Oct 31 '25 11:10

user28816434


2 Answers

There is no overflow in the code shown in C implementations where char is eight bits. (In a C implementation in which char is signed and the same width as int, there would be overflow.) This is because the c + 1 and ++ operations are performed using the int type. Your lecturer and your friend are wrong. The code loops forever because c < (c + 1) is always true, without overflow and with or without optimization.

The remainder of this answer presumes char is eight bits and int is 32 bits. Citations are given from C 2024 but the issues discussed are the same in prior versions of the C standard, to 1999 at least. As used by the C standard, “overflow” means an operation is specified to have a result that is not representable in the type used for the operation. For example, + for signed int is specified to perform mathematical addition, so 2147483647 + 1 is specified to produce 2,147,483,648, but 2,147,483,648 is not representable in a 32-bit int, so, in a C implementation with 32-bit int, it overflows. Then the behavior is not defined by the C standard.

In c + 1, the usual arithmetic conversions are performed. The char c is converted to int. For any eight-bit char value, the sum of that value and 1 is representable in an int, so the addition does not overflow. Then, for c < (c + 1), the c on the left is also converted to int. Then these two int are compared. Since the value on the right is 1 more than the c on the left, the comparison is true, and the loop continues.

In c++, there are effectively two operations of concern:

  • int arithmetic is used to add 1 to the value of c. (The use of int arithmetic for this is discussed below.)
  • The sum is converted to char, to be stored into c.

When c is the maximum value of a char, this will produce a value not representable in char. This does not overflow since the arithmetic is performed in the int type. For the conversion, two cases are possible:

  • char is unsigned. In this case, C 2024 6.3.2.3 specifies the value is wrapped modulo 256: “the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.” So, when c is 255, adding 1 produces 256, and then converting 256 to char yields 256 − 256 = 0.
  • char is signed. In this case, 6.3.2.3 specifies “either the result is implementation-defined or an implementation-defined signal is raised”.

In the latter case, where a signal is raised, this would stop the loop. This is currently rare in C implementations; most produce some implementation-defined result, most often wrapping modulo 256. So, with eight-bit two’s complement, the maximum char value is 127, adding 1 produces 128, and conversion to char yields −128, and the loop continues. Note that, with either unsigned or signed char, the result is specified to be a value representable in char or a signal, so there is no overflow (no specified result that is unrepresentable).

++ uses int arithmetic.

C 2024 6.5.3.5 says that postfix ++ adds the value of 1 of “the appropriate type” to its operand. Unfortunately, it does not explicitly say the usual arithmetic conversions are applied. However, it does say “See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers,” so that tells us the discussions of additive operators (and its conversions) and of assignment apply to postfix ++. The relevant additive operator is +, and its discussion, in 6.5.7, says the usual arithmetic conversions apply. So the addition is performed with int arithmetic, and then of course it must be converted to char, as specified for assignment operators.

Additionally, for prefix ++, 6.5.4.2 tells us ++E is equivalent to (E+=1), which explicitly invokes the += operator, so we know the rules for += apply, from which we know the usual arithmetic conversions are applied, so ++c would use int. It would be bizarre if ++c used int but c++ used char, so this is reassurance our conclusion in the above paragraph is correct.

like image 111
Eric Postpischil Avatar answered Nov 03 '25 02:11

Eric Postpischil


This is a very bad example to use for teaching, for several reasons:

  • First of all, char has implementation-defined signedness, if it is signed or unsigned depends on the compiler. Without knowing that, we can't say that it will overflow or wrap-around. Overflow is a term used for signed types and when it happens the program invokes undefined behavior. Wrap-around is a term used for unsigned types and it is well-defined, going from the highest number to zero.

  • For this reason, it is wrong to expect any particular behavior in a scenario where char actually overflows, which is the case on the gcc compiler since it uses signed char.

In case of signed char there will indeed be an overflow during c++. This is because the expression c + 1 comes with an implicit promotion to int, so it will always be larger than c no matter. Optimization has nothing to do with it.

If we modify the example to while (c < (char)(c + 1)), we force the promoted int expression back to char. This is a questionable cast in case of signed types when the value can't be represented - this too is implementation-defined behavior (compiler dependent) and the program could in theory raise signals. But in case of gcc, the scenario 127 + 1 followed by the cast happens to result in a -128 without any such problems, stopping the loop.

like image 26
Lundin Avatar answered Nov 03 '25 00:11

Lundin