Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is f(i = -1, i = -1) undefined behavior?

People also ask

What causes undefined Behaviour?

The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.” In very simple words, an object when modified more than once between two sequence points will result in undefined behaviour.

What causes undefined behavior C++?

In C/C++ bitwise shifting a value by a number of bits which is either a negative number or is greater than or equal to the total number of bits in this value results in undefined behavior.

What does undefined behavior mean in C?

So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.


Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:

If A is not sequenced before B and B is not sequenced before A, then two possibilities exist:

  • evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the compiler may interleave the CPU instructions that comprise A and B)

  • evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete before B, or B will be complete before A. The order may be the opposite the next time the same expression is evaluated.

That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.

For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:

f(i=-1, i=-1)

might become:

clear i
clear i
decr i
decr i

Now i is -2.

It is probably a bogus example, but it is possible.


First, "scalar object" means a type like a int, float, or a pointer (see What is a scalar Object in C++?).


Second, it may seem more obvious that

f(++i, ++i);

would have undefined behavior. But

f(i = -1, i = -1);

is less obvious.

A slightly different example:

int i;
f(i = 1, i = -1);
std::cout << i << "\n";

What assignment happened "last", i = 1, or i = -1? It's not defined in the standard. Really, that means i could be 5 (see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.

But now you ask: "What about my example? I used the same value (-1) for both assignments. What could possibly be unclear about that?"

You are correct...except in the way the C++ standards committee described this.

If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i could still be 5. Or your hard drive could be empty. Thus the answer to your question is:

It is undefined behavior because it is not defined what the behavior is.

(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)

Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".

That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".


A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.


A practical reason to not make an exception from the rules just because the two values are the same:

// config.h
#define VALUEA  1

// defaults.h
#define VALUEB  1

// prog.cpp
f(i = VALUEA, i = VALUEB);

Consider the case this was allowed.

Now, some months later, the need arises to change

 #define VALUEB 2

Seemingly harmless, isn't it? And yet suddenly prog.cpp wouldn't compile anymore. Yet, we feel that compilation should not depend on the value of a literal.

Bottom line: there is no exception to the rule because it would make successful compilation depend on the value (rather the type) of a constant.

EDIT

@HeartWare pointed out that constant expressions of the form A DIV B are not allowed in some languages, when B is 0, and cause compilation to fail. Hence changing of a constant could cause compilation errors in some other place. Which is, IMHO, unfortunate. But it is certainly good to restrict such things to the unavoidable.


The confusion is that storing a constant value into a local variable is not one atomic instruction on every architecture the C is designed to be run on. The processor the code runs on matters more than the compiler in this case. For example, on ARM where each instruction can not carry a complete 32 bits constant, storing an int in a variable needs more that one instruction. Example with this pseudo code where you can only store 8 bits at a time and must work in a 32 bits register, i is a int32:

reg = 0xFF; // first instruction
reg |= 0xFF00; // second
reg |= 0xFF0000; // third
reg |= 0xFF000000; // fourth
i = reg; // last

You can imagine that if the compiler wants to optimize it may interleave the same sequence twice, and you don't know what value will get written to i; and let's say that he is not very smart:

reg = 0xFF;
reg |= 0xFF00;
reg |= 0xFF0000;
reg = 0xFF;
reg |= 0xFF000000;
i = reg; // writes 0xFF0000FF == -16776961
reg |= 0xFF00;
reg |= 0xFF0000;
reg |= 0xFF000000;
i = reg; // writes 0xFFFFFFFF == -1

However in my tests gcc is kind enough to recognize that the same value is used twice and generates it once and does nothing weird. I get -1, -1 But my example is still valid as it is important to consider that even a constant may not be as obvious as it seems to be.


Behavior is commonly specified as undefined if there is some conceivable reason why a compiler which was trying to be "helpful" might do something which would cause totally unexpected behavior.

In the case where a variable is written multiple times with nothing to ensure that the writes happen at distinct times, some kinds of hardware might allow multiple "store" operations to be performed simultaneously to different addresses using a dual-port memory. However, some dual-port memories expressly forbid the scenario where two stores hit the same address simultaneously, regardless of whether or not the values written match. If a compiler for such a machine notices two unsequenced attempts to write the same variable, it might either refuse to compile or ensure that the two writes cannot get scheduled simultaneously. But if one or both of the accesses is via a pointer or reference, the compiler might not always be able to tell whether both writes might hit the same storage location. In that case, it might schedule the writes simultaneously, causing a hardware trap on the access attempt.

Of course, the fact that someone might implement a C compiler on such a platform does not suggest that such behavior shouldn't be defined on hardware platforms when using stores of types small enough to be processed atomically. Trying to store two different values in unsequenced fashion could cause weirdness if a compiler isn't aware of it; for example, given:

uint8_t v;  // Global

void hey(uint8_t *p)
{
  moo(v=5, (*p)=6);
  zoo(v);
  zoo(v);
}

if the compiler in-lines the call to "moo" and can tell it doesn't modify "v", it might store a 5 to v, then store a 6 to *p, then pass 5 to "zoo", and then pass the contents of v to "zoo". If "zoo" doesn't modify "v", there should be no way the two calls should be passed different values, but that could easily happen anyway. On the other hand, in cases where both stores would write the same value, such weirdness could not occur and there would on most platforms be no sensible reason for an implementation to do anything weird. Unfortunately, some compiler writers don't need any excuse for silly behaviors beyond "because the Standard allows it", so even those cases aren't safe.