Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Why) is using an uninitialized variable undefined behavior?

If I have:

unsigned int x; x -= x; 

it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).

Two questions:

  • Is the behavior of this code indeed undefined?
    (E.g. Might the code crash [or worse] on a compliant system?)

  • If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?

    i.e. What is the advantage given by not defining the behavior here?

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

like image 984
user541686 Avatar asked Aug 14 '12 23:08

user541686


People also ask

What happens if you don't initialize a variable?

If you don't initialize an variable that's defined inside a function, the variable value remain undefined. That means the element takes on whatever value previously resided at that location in memory.

What happens when you try to use an uninitialized variable?

An uninitialized variable has an undefined value, often corresponding to the data that was already in the particular memory location that the variable is using. This can lead to errors that are very hard to detect since the variable's value is effectively random, different values cause different errors or none at all.

Why are uninitialized variables bad?

"Uninitialized variables contain some value" is a incorrect statement which unfortunately is teached. A program who access an uninitialized variable has Undefined Behavior, which means it can have any behavior.

What does it mean when a variable is uninitialized?

In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.


2 Answers

Yes this behavior is undefined but for different reasons than most people are aware of.

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

Edit: The relevant phrase of the standard is 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

And to make it clearer, the following code is legal under all circumstances:

unsigned char a, b; memcpy(&a, &b, 1); a -= a; 
  • Here the addresses of a and b are taken, so their value is just indeterminate.
  • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.
  • At the end a must hold the value 0.

Edit2: a and b have unspecified values:

3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

like image 71
Jens Gustedt Avatar answered Sep 26 '22 13:09

Jens Gustedt


The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.

Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.

Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment

unsigned x, y, z;   /* 0 */ y = 0;              /* 1 */ z = 4;              /* 2 */ x = - x;            /* 3 */ y = y + z;          /* 4 */ x = y + 1;          /* 5 */ 

When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:

r1 = 0; r0 = 4; r0 = - r0; r1 += r0; r0 = r1; 

The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.

For a more elaborate example, consider the following code fragment:

unsigned i, x; for (i = 0; i < 10; i++) {     x = (condition() ? some_value() : -x); } 

Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written

unsigned i, x; i = 0; /* if some_value() uses i */ x = some_value(); for (i = 1; i < 10; i++) {     x = (condition() ? some_value() : -x); } 

The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:

  • during the first loop iteration, x is uninitialized by the time -x is evaluated.
  • -x has undefined behavior, so its value is whatever-is-convenient.
  • The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.

When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.

like image 32
Gilles 'SO- stop being evil' Avatar answered Sep 26 '22 13:09

Gilles 'SO- stop being evil'