Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine what is 'sequenced before' others?

Tags:

I went through this excellent answer regarding Undefined Behaviour and Sequenced [Before/After] relations in C++11. I understand the binary relation concepts, but am missing what the new rules governing sequencing are.

For these familiar examples, how do the new sequencing rules apply?

  1. i = ++i;
  2. a[++i] = i;

More specifically, what are the new C++11 sequencing rules?

I am looking for some rules like (this one is completely made up)

The lhs of an '=' statement is always sequenced before the rhs, and is thus evaluated first.

In case these are available in the standard itself, can someone quote the same here?

like image 429
Lazer Avatar asked Mar 05 '12 06:03

Lazer


2 Answers

The sequenced-before relationship, and the rules concerning it are a "tidying up" of the prior rules on sequence points, defined in a consistent way with the other memory model relationships such as happens-before and synchronizes-with so that it can be precisely specified which operations and effects are visible under which circumstances.

The consequences of the rules are unchanged for simple single-threaded code.

Let's start with your examples:

1. i = ++i;

If i is a built-in type such as int then there are no function calls involved, everything is a built-in operator. There are thus 4 things that happen:

(a) The value computation of ++i, which is original-value-of-i +1

(b) The side effect of ++i, which stores original-value-of-i +1 back into i

(c) The value computation of the assignment, which is just the value stored, in this case the result of the value computation of ++i

(d) The side effect of the assignment, which stores the new value into i

All of these things are sequenced-before the following full expression. (i.e. they are all complete by the final semicolon of the statement)

Since ++i is equivalent to i+=1, the side effect of storing the value is sequenced-before the value computation of ++i, so (b) is sequenced-before (a).

The value computation of both operands of an assignment is sequenced-before the value computation of the assignment itself, and that is in turn sequenced-before the side effect of storing the value. Therefore (a) is sequenced before (c), and (c) is sequenced-before (d).

We therefore have (b) -> (a) -> (c) -> (d), and this is thus OK under the new rules, whereas it was not OK under C++98.

If i was a class, then the expression would be i.operator=(i.operator++()), or i.operator=(operator++(i)), and all effects of the operator++ call are sequenced-before the call to operator=.

2. a[++i] = i;

If a is an array type, and i is an int, then again the expression has several parts:

(a) The value computation of i

(b) The value computation of ++i

(c) The side effect of ++i, which stores the new value back into i

(d) The value computation of a[++i], which returns an lvalue for the element of a indexed by the value computation of ++i

(e) The value computation of the assignment, which is just the value stored, in this case the result of the value computation of i

(f) The side effect of the assignment, which stores the new value into the array element a[++i]

Again, all of these things are sequenced-before the following full expression. (i.e. they are all complete by the final semicolon of the statement)

Again, since ++i is equivalent to i+=1, the side effect of storing the value is sequenced-before the value computation of ++i, so (c) is sequenced-before (b).

The value computation of the array index ++i is *sequenced-before` the value computation of the element selection, so (b) is sequenced-before (d).

The value computation of both operands of an assignment is sequenced-before the value computation of the assignment itself, and that is in turn sequenced-before the side effect of storing the value. Therefore (a) and (d) are sequenced before (e), and (e) is sequenced-before (f).

We therefore have two sequences: (a) -> (d) -> (e) -> (f) and (c) -> (b) -> (d) -> (e) -> (f).

Unfortunately, there is no ordering between (a) and (c). Thus a side effect which stores to i is unsequenced with respect to a value computation on i, and the code exhibits undefined behaviour. This is again given by 1.9p15 of the C++11 standard.

As above, if i is of class type then everything is fine, because the operators become function calls, which impose sequencing.

The rules

The rules are relatively straightforward:

  1. The value computations of the arguments of a built-in operator are sequenced-before the value computation of the operator itself.

  2. The side effects of a built-in assignment operator or preincrement operator are sequenced-before the value computation of the result.

  3. The value computation of any other built-in operator is sequenced-before the side effects of that operator.

  4. The value computation and side-effects of the left-hand side of the built-in comma operator are sequenced-before the value computation and side-effects of the right-hand side.

  5. All value computations and side effects of a full expression are sequenced-before the next full expression.

  6. The value computation and side effects of the arguments of a function call are sequenced before the first full expression in the function.

  7. The value computation and side effects of everything inside a function are sequenced-before the value computation of the result.

  8. For any two function calls in the full expression, either the value computation of the result of one is sequenced-before the call to the other, or vice-versa. If no other rule specifies the ordering, the compiler may choose.

    Thus in a()+b(), either a() is sequenced-before b(), or b() is sequenced-before a(), but there is no rule to specify which.

  9. If there are two side effects that modify the same variable, and neither is sequenced-before the other, the code has undefined behaviour.

  10. If there is a side effect that modifies a variable, and a value computation that reads that variable, and neither is sequenced-before the other, the code has undefined behaviour.

like image 119
Anthony Williams Avatar answered Dec 29 '22 11:12

Anthony Williams


This is in my opinion a much more complex rule than the old rule of sequence points, and I'm not 100% positive I understood it right... anyway IIUC it all boils down to if to get the value you need the side effect to have been already applied.

First case

i = ++i;

Here to do the assignment you need the value of the right part, and to get that value you need the side effect to have been already applied; therefore here the assignment is sequenced after the increment and all is fine. The important point here is that to do the assignment you need the value of RHS and only the address of LHS.

To recap:

  1. assignment is sequenced after &i and ++i
  2. ++i is sequenced after the increment
  3. (transitivity) assignment is sequenced after increment

The value of i is read only once, after the increment. It is written twice, once by the increment and once by the assignment, but these two operations are sequenced (first the increment, then the assignment).

Second case

a[++i] = i;

Here instead you need the value of i for the RHS and the value of ++i for LHS. These two expressions however are not sequenced (the assignment operator is not imposing a sequencing) and therefore the result is undefined.

To recap:

  1. assignment is sequenced after &a[++i] and i
  2. &a[++i] is sequenced after ++i
  3. ++i is sequenced after the increment

Here the value of i is read twice, once for the LHS of assignment and once for the RHS. The LHS part also does a modification (the increment). This write access and the read access of the assignment RHS are however not sequenced in respect to each other, and therefore this expression is UB.

Final rant

Let me repeat that I'm not sure of what I just said... my strong opinion is that this new sequenced before/after approach is much harder to understand. The new rules hopefully only made some expressions that were UB before now well defined (and UB is the worst possible result), but it also made the rules much more complex (it was just "don't change the same thing twice between sequence points"... you didn't have to do a mental topological sort to guess if something was UB or not).

In a sense the new rules did no damage to C++ programs (UB is the enemy, and now there's less UB in this area) but did a damage to the language by increasing complexity (and for sure something C++ didn't need was added complexity).

Note also that the funny thing about ++i is that the returned value is an l-value (that's why ++ ++ i is legal), so it's basically an address and it was not logically needed that the returned value is sequenced after the increment. But the standard says so and this is the rule you need to burn into your neurons. Of course to have a "usable" ++i you want the users of the value to get the updated value, but still as far as the ++ operator see things (it's returning an address that is unaffected by the increment) this sequencing was not formally needed.

With new rules you not only are needed to do a mental topological sort to see if an expression is valid, but you also need to do that using arbitrary sequence relations that you just need to memorize.

While of course you as a programmer will hopefully never write code that changes the same value multiple times without a crystal-clear sequence, still you will be faced with bugs in code written by other programmers... where things are not as clear and where you now need to think harder to just understand if something is legal C++ or not.

like image 22
6502 Avatar answered Dec 29 '22 10:12

6502