Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Formal definition of restrict fails to account for valid cases

The following reasonable program seems to have undefined behavior according to the formal definition of restirct in the C standard:

void positive_intcpy(int * restrict q, const int * restrict p, size_t n) {
  int *qBgn = q;
  const int *pEnd = p + n;
  // sequence point S
  while (p != pEnd && *p>0) *q++ = *p++;
  if (q != qBgn) fprintf(stderr,"Debug: %d.\n",*(q-1)); // undefined behavior!?
}
int main(void) {
  int a[6] = {4,3,2,1,0,-1};
  int b[3];
  positive_intcpy(b,a,3);
  return 0;
}

The function copies integers from one array to another, as long as the integers are positive. The fprintf call displays the last positive integer that was copied (if any). There is never any aliasing between p and q.

Is this really UB, or is my reasoning wrong?

This question concerns section 6.7.3.1 of the C99 standard. The relevant text is unchanged in the latest draft for C23.

We are talking about the pointer expression q-1 marked above. Is it based on the restricted pointer object designated by p?

The standard says:

In what follows, a pointer expression E is said to be based on object P [where P is a restrict-qualified pointer object] if (at some sequence point in the execution of B [where B is the block associated with the declaration of P] prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E [see footnote]. Note that "based" is defined only for expressions with pointer types.

[footnote] In other words, E depends on the value of P itself rather than on the value of an object referenced indirectly through P. For example, if identifier p has type (int **restrict), then the pointer expressions p and p+1 are based on the restricted pointer object designated by p, but the pointer expressions *p and p[1] are not.

In our program, at the sequence point S marked above, modifying p to point to a copy of the a array would cause p != pEnd to always be true (because pEnd is not modified together with p), thus the loop would execute until *p>0 becomes false, thus the value of q at the end of the loop would change (it would be one machine word greater). Therefore we conclude that our expression q-1 is based on p.

Now the standard says:

During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T [where T is the type to which P is declared to point] shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

In our case, L is *(q-1). X is the object at &b[2], which has been modified in B to the value of 2. However, T is const int, which means the program has UB.

You might say that this is harmless, because the compiler will probably not take advantage of such a hard to prove UB case and so won't mis-optimize it.

But the same logic can be applied in reverse, where it becomes truly dangerous:

int f(int *restrict p, int *restrict q) {
  int *p0=p, *q0=q;
  // sequence point S
  if (p != p0) q++;
  if (q != q0) p++;
  *p = 1;
  *q = 2;
  return *p + *q;
}
int main(void) {
  int x;
  printf("%d\n", f(&x,&x));
  return 0;
}

GCC does optimize here. GCC -O0 prints 4, while GCC -O3 prints 3. Unfortunately, reading the standard literally, GCC must be wrong.

Here's why:

  1. In the expression *q = 2, q is "based on" p (because if p was modified at sequence point S to point to a copy of x, the conditional p != p0 would become true, changing the value of q).
  2. In the expression *p = 1, p is "based on" q (because if q was modified at sequence point S to point to a copy of x, the conditional q != q0 would become true, changing the value of p).
  3. In the body of f, any access to any object via a pointer expression is always both "based on" p and "based on" q. So no restrict-violation, neither of restrict p, nor of restrict q. Note that no restrict-qualified pointer is ever assigned a pointer expression inside f (because p++ and q++ do not actually happen).
  4. Since there is no restrict violation, the program doesn't have UB and the compiler is not allowed to optimize.

It seems to me that the standard failed to define "based on" in the intended way. What is the intended way, then? I mean, clearly there is an intended meaning, which is allowing GCC to optimize, and I believe GCC is morally right in this case. I want to avoid writing programs that might be mis-optimized.

like image 357
log65536 Avatar asked Jan 30 '26 10:01

log65536


1 Answers

Your assessment that the wording of 6.7.3.1p3 allows for the case you mentioned is correct. This wording also goes against the intended meaning.

The intent is first mentioned in section 6.7.3p8:

An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer.

So by this description q and q-1 are based on the object q, not the object p.

An example is given in section 6.7.3.1p7:

EXAMPLE 1 The file scope declarations

int * restrict a;
int * restrict b;
extern int c[];

assert that if an object is accessed using one of a, b, or c, and that object is modified anywhere in the program, then it is never accessed using either of the other two

Which further backs up this intent.

By a strict reading of 6.7.3.1p3, the expression q-1 is based on both the objects p and q, which doesn't seem to make much sense and is counter to the above quoted description and example.

The intent of 6.7.3.1p3, I believe, was to capture transitive expressions such as the following:

int x[5];
int * restrict p = x;
int *p1 = p+2;    // p1 is based on p
int *p2 = p1-1;   // p2 is also based on p

As simply stating "an expression including p among its operands" wouldn't account for such a case.

A possible rewording of 6.7.3.1p3 could be the following:

A pointer expression E is said to be based on object P if any of the following holds:

  • E contains P as an operand
  • E contains an lvalue V as an operand, and V was set to the value of an expression E2 which is based on P.
like image 92
dbush Avatar answered Feb 01 '26 02:02

dbush



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!