The following reasonable program seems to have undefined behavior according to the formal definition of restirct in the C standard:
void positive_intcpy(int * restrict q, const int * restrict p, size_t n) {
int *qBgn = q;
const int *pEnd = p + n;
// sequence point S
while (p != pEnd && *p>0) *q++ = *p++;
if (q != qBgn) fprintf(stderr,"Debug: %d.\n",*(q-1)); // undefined behavior!?
}
int main(void) {
int a[6] = {4,3,2,1,0,-1};
int b[3];
positive_intcpy(b,a,3);
return 0;
}
The function copies integers from one array to another, as long as the integers are positive. The fprintf call displays the last positive integer that was copied (if any). There is never any aliasing between p and q.
Is this really UB, or is my reasoning wrong?
This question concerns section 6.7.3.1 of the C99 standard. The relevant text is unchanged in the latest draft for C23.
We are talking about the pointer expression q-1 marked above. Is it based on the restricted pointer object designated by p?
The standard says:
In what follows, a pointer expression
Eis said to be based on objectP[wherePis a restrict-qualified pointer object] if (at some sequence point in the execution ofB[whereBis the block associated with the declaration ofP] prior to the evaluation ofE) modifyingPto point to a copy of the array object into which it formerly pointed would change the value ofE[see footnote]. Note that "based" is defined only for expressions with pointer types.[footnote] In other words,
Edepends on the value ofPitself rather than on the value of an object referenced indirectly throughP. For example, if identifierphas type(int **restrict), then the pointer expressionspandp+1are based on the restricted pointer object designated byp, but the pointer expressions*pandp[1]are not.
In our program, at the sequence point S marked above, modifying p to point to a copy of the a array would cause p != pEnd to always be true (because pEnd is not modified together with p), thus the loop would execute until *p>0 becomes false, thus the value of q at the end of the loop would change (it would be one machine word greater). Therefore we conclude that our expression q-1 is based on p.
Now the standard says:
During each execution of
B, letLbe any lvalue that has&Lbased onP. IfLis used to access the value of the objectXthat it designates, andXis also modified (by any means), then the following requirements apply:T[whereTis the type to whichPis declared to point] shall not be const-qualified. Every other lvalue used to access the value ofXshall also have its address based onP. Every access that modifiesXshall be considered also to modifyP, for the purposes of this subclause. IfPis assigned the value of a pointer expressionEthat is based on another restricted pointer objectP2, associated with blockB2, then either the execution ofB2shall begin before the execution ofB, or the execution ofB2shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.
In our case, L is *(q-1). X is the object at &b[2], which has been modified in B to the value of 2. However, T is const int, which means the program has UB.
You might say that this is harmless, because the compiler will probably not take advantage of such a hard to prove UB case and so won't mis-optimize it.
But the same logic can be applied in reverse, where it becomes truly dangerous:
int f(int *restrict p, int *restrict q) {
int *p0=p, *q0=q;
// sequence point S
if (p != p0) q++;
if (q != q0) p++;
*p = 1;
*q = 2;
return *p + *q;
}
int main(void) {
int x;
printf("%d\n", f(&x,&x));
return 0;
}
GCC does optimize here. GCC -O0 prints 4, while GCC -O3 prints 3.
Unfortunately, reading the standard literally, GCC must be wrong.
Here's why:
*q = 2, q is "based on" p (because if p was modified at sequence point S to point to a copy of x, the conditional p != p0 would become true, changing the value of q).*p = 1, p is "based on" q (because if q was modified at sequence point S to point to a copy of x, the conditional q != q0 would become true, changing the value of p).f, any access to any object via a pointer expression is always both "based on" p and "based on" q. So no restrict-violation, neither of restrict p, nor of restrict q. Note that no restrict-qualified pointer is ever assigned a pointer expression inside f (because p++ and q++ do not actually happen).restrict violation, the program doesn't have UB and the compiler is not allowed to optimize.It seems to me that the standard failed to define "based on" in the intended way. What is the intended way, then? I mean, clearly there is an intended meaning, which is allowing GCC to optimize, and I believe GCC is morally right in this case. I want to avoid writing programs that might be mis-optimized.
Your assessment that the wording of 6.7.3.1p3 allows for the case you mentioned is correct. This wording also goes against the intended meaning.
The intent is first mentioned in section 6.7.3p8:
An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer.
So by this description q and q-1 are based on the object q, not the object p.
An example is given in section 6.7.3.1p7:
EXAMPLE 1 The file scope declarations
int * restrict a; int * restrict b; extern int c[];assert that if an object is accessed using one of
a,b, orc, and that object is modified anywhere in the program, then it is never accessed using either of the other two
Which further backs up this intent.
By a strict reading of 6.7.3.1p3, the expression q-1 is based on both the objects p and q, which doesn't seem to make much sense and is counter to the above quoted description and example.
The intent of 6.7.3.1p3, I believe, was to capture transitive expressions such as the following:
int x[5];
int * restrict p = x;
int *p1 = p+2; // p1 is based on p
int *p2 = p1-1; // p2 is also based on p
As simply stating "an expression including p among its operands" wouldn't account for such a case.
A possible rewording of 6.7.3.1p3 could be the following:
A pointer expression E is said to be based on object P if any of the following holds:
- E contains P as an operand
- E contains an lvalue V as an operand, and V was set to the value of an expression E2 which is based on P.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With