Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defined behaviour for expressions

The C99 Standard says in $6.5.2.

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

(emphasis by me)

It goes on to note, that the following example is valid (which seems obvious at first)

a[i] = i;

While it does not explicitly state what a and i are.

Although I believe it does not, I'd like to know whether this example covers the following case:

int i = 0, *a = &i;
a[i] = i;

This will not change the value of i, but access the value of i to determine the address where to put the value. Or is it irrelevant that we assign a value to i which is already stored in i? Please shed some light.


Bonus question; What about a[i]++ or a[i] = 1?

like image 474
bitmask Avatar asked Jan 29 '12 19:01

bitmask


1 Answers

The first sentence:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

is clear enough. The language doesn't impose an order of evaluation on subexpressions unless there's a sequence point between them, and rather than requiring some unspecified order of evaluation, it says that modifying an object twice produces undefined behavior. This allows aggressive optimization while still making it possible to write code that follows the rules.

The next sentence:

Furthermore, the prior value shall be read only to determine the value to be stored

does seem unintuitive at first (and second) glance; why should the purpose for which a value is read affect whether an expression has defined behavior?

But what it reflects is that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated. The C90 and C99 standards do not state this explicitly.

A clearer violation of that sentence, given in an example in the footnote, is:

a[i++] = i; /* undefined behavior */

Assuming that a is a declared array object and i is a declared integer object (no pointer or macro trickery), no object is modified more than once, so it doesn't violate the first sentence. But the evaluation of i++ on the LHS determines which object is to be modified, and the evaluation of i on the RHS determines the value to be stored in that object -- and the relative order of the read operation on the RHS and the write operation on the LHS is not defined. Again, the language could have required the subexpressions to be evaluated in some unspecified order, but instead it left the entire behavior undefined, to permit more aggressive optimization.

In your example:

int i = 0, *a = &i;
a[i] = i; /* undefined behavior (I think) */

the previous value of i is read both to determine the value to be stored and to determine which object it's going to be stored in. Since a[i] refers to i (but only because i==0), modifying the value of i would change the object to which the lvalue a[i] refers. It happens in this case that the value stored in i is the same as the value that was already stored there (0), but the standard doesn't make an exception for stores that happen to store the same value. I believe the behavior is undefined. (Of course the example in the standard wasn't intended to cover this case; it implicitly assumes that a is a declared array object unrelated to i.)

As for the example that the standard says is allowed:

int a[10], i = 0; /* implicit, not stated in standard */
a[i] = i;

one could interpret the standard to say that it's undefined. But I think that the second sentence, referring to "the prior value", applies only to the value of an object that's modified by the expression. i is never modified by the expression, so there's no conflict. The value of i is used both to determine the object to be modified by the assignment, and the value to be stored there, but that's ok, since the value of i itself never changes. The value of i isn't "the prior value", it's just the value.

The C11 standard has a new model for this kind of expression evaluation -- or rather, it expresses the same model in different words. Rather than "sequence points", it talks about side effects being sequenced before or after each other, or unsequenced relative to each other. It makes explicit the idea that if a subexpression B depends on the result of a subexpression A, then A must be evaluated before B can be evaluated.

In the N1570 draft, section 6.5 says:

1 An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.

2 If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

3 The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced.

like image 73
Keith Thompson Avatar answered Nov 23 '22 23:11

Keith Thompson