Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would happen if "i = i++" was not considered undefined behavior? [closed]

I'm having trouble understanding the difference between unspecified and undefined behavior. I think trying to understand some examples would be useful. For instance, x = x++. The problem with this assignment is that:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

This violates a shall rule, but does not explicitly invoke undefined behavior, but it involves UB according to:

The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.

Assuming none of these rules existed and there are no other rules that "invalidate" x = x++. The value of x would then be unspecified, right?

The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.

Edit: As pointed out by P.W, there is a somewhat related, well-received, version of this question for C++: What made i = i++ + 1; legal in C++17?.

like image 246
jinawee Avatar asked Dec 24 '22 01:12

jinawee


2 Answers

I'm having trouble understanding the difference between unspecified and undefined behavior.

Then let's start with the definitions of those terms from the Standard:


undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

EXAMPLE An example of undefined behavior is the behavior on integer overflow.

(C2011, 3.4.3)


unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

(C2011, 3.4.4)


You remark that

The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.

It is perhaps over-aggrandizing that to call it an argument, as if there were some doubt about its validity. In truth, it reflects explicit language from the standard:

If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ''behavior that is undefined''.

(C2011, 4/2; emphasis added)

When you posit

Assuming none of these rules existed and there are no other rules that "invalidate" x = x++.

, that doesn't necessarily change anything. In particular, removing the explicit rule that the order of evaluation of the operands is unspecified does not make the order specified. I'd be inclined to argue that the order remains unspecified, but the alternative is that the behavior would be undefined. The primary purpose served by explicitly saying it's unspecified is to sidestep that question.

The rule explicitly declaring UB when an object is modified twice between sequence points is a little less clear, but falls in the same boat. One could argue that the standard still did not define behavior for your example case, leaving it undefined. I think that's a bit more of a stretch, but that's exactly why it is useful to have an explicit rule, one way or the other. It would be possible to define behavior for your case -- Java does, for example -- but C chooses not to do, for a variety of technical and historical reasons.

The value of x would then be unspecified, right?

That's not entirely clear.

Please understand, too, that the various provisions of the standard for the most part do not stand alone. They are designed to work together, as a (mostly) coherent whole. Removing or altering random provisions has considerable risk of producing inconsistencies or gaps, leaving it difficult to reason about the result.

like image 135
John Bollinger Avatar answered Dec 31 '22 01:12

John Bollinger


Modern C11/C17 has changed the text, but it has pretty much the same meaning. C17 6.5/2:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

There are several slightly different issues here, mixed into one:

  • Between sequence points, x is written to (side effect) more than once. This is UB as per the above.
  • Between sequence points, the expression contains at least one side effect and there is a value computation of the same variable not related to which value to be stored. This is also UB as per the above.
  • In the expression x = x++, the evaluation of the operand x is not sequenced in relation to the operand x++. The evaluation order is unspecified behavior as per C17 6.5.16.

    The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

If not for the first cited part labelling this UB, then we still wouldn't know if the x++ would be sequenced before or after the evaluation of the left x operand, so it is hard to reason about how this could become "just unspecified behavior".

C++17 actually fixed this part, making it well-defined there, unlike in C or earlier C++ versions. They did so by defining the sequence order (C++17 8.5.18):

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand.

I don't see how there can be any middle-ground here; either the expression is undefined or it is well-defined.


Unspecified behavior is deterministic behavior which we cannot know or assume anything about. But unlike undefined behavior, it won't cause crashes and random program behavior. A good example is a() + b(). We can't know which function that will be executed first - the program doesn't even have to be consistent if the same line appears later on in the same program. But we can know that both functions will be executed, one before the other.

Unlike x = a() + b() + x++; which is undefined behavior and we can't assume anything about it. One, both or none of the functions might be executed, in any order. The program might crash, produce incorrect results, produce seemingly correct results or do nothing at all.

like image 25
Lundin Avatar answered Dec 31 '22 01:12

Lundin