Is accessing outer multidimensional array using reference to inner array defined in C?

Question

Consider the following piece of code:

void process(int *);

int func(void) {
    int arr[2][2] = {{1,2},{3,4}};
    process(arr[0]);
    return arr[1][0];
}

Assume the function process is implemented in Assembly language.

As explained in Is accessing an element of a multidimensional array out of bounds undefined behavior?, it is undefined behaviour to access the element arr[1][0] using arr[0][2], which through pointer arithmetic should point to the same memory location.

Can the C compiler compiling the above code assume that arr[1][0] is unchanged after the call to process and hence e.g. optimise the last line to return 3; since the process function only gets access to arr[0][0] and arr[0][1]?

But maybe it is not undefined behaviour if the compiler assumes process internally first casts arr[0] to an int[2][2] pointer, similar to how the common container_of macro gets access to a pointer to the struct using a pointer to a member value so thus the compiler would not be allowed to perform the optimisation above?

sayurc · Accepted Answer

It is Undefined Behavior and the Compiler can Optimize Based on That

In this line

process(arr[0]);

the function process receives a pointer to the first element of the array arr[0]. The C99 standard says the following about pointer arithmetic in section 6.5.6:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

and

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So while it is true that process is allowed to increment the pointer to one past the last element of arr[0], reading or writing to this location would be undefined behavior. Going past this location would also be undefined behavior. Therefore, although all elements of all the arrays are allocated contiguously in memory, process cannot interact with the other arrays, doing so is undefined behavior.

The standard explicitly states the compiler may pretend that any code triggering undefined behavior doesn’t exist. So, if my interpretation of the rules above is correct, I believe that the compiler should be allowed to completely disregard the fact that process might try to access the next array and just return 3.

But maybe it is not undefined behaviour if the compiler assumes process internally first casts arr[0] to an int[2][2] pointer, similar to how the common container_of macro gets access to a pointer to the struct using a pointer to a member value so thus the compiler would not be allowed to perform the optimisation above?

If by this you mean casting the int * to an int ** and then using it like a 2-dimensional array then you misunderstand what this actually does. int * points to an int while int ** points to an int *, so what you are actually doing is treating the int that int * points to as if it were an int *. Pointers are not arrays, int ** doesn’t work the same way as int [][].

Also, the fact that process is implemented in Assembly is irrelevant because the function call is in C and the compiler can still make the same assumptions. In fact, there is no way for the compiler to know whether some function was implemented in Assembly or not, since both C and Assembly are compiled to object files and then linked by a linker. All your compiler does is turn the C code into an object file that calls some function called process with your OS’s calling convention.

(Failed) Attempt to Get Around It

Someone in the comments proposed this definition for process:

void process(int *p)
{
    int (*q)[2] = (int(*)[2])p;
    q[1][0] = 17;
}

Section 6.3.2.3 of the standard defines pointer conversion and the following paragraph addresses the case of the code above:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

So if the resulting pointer is not correctly aligned, the conversion itself is undefined behavior. However, even if the conversion is legal the only thing the standard gives you is the ability to convert back to the original type. Whereas in conversions to char * you can increment the pointer and the resulting pointers do point to valid objects (bytes, i.e. char objects), which implies you can access each byte of an object (but you still can’t access x[1] because of the pointer arithmetic rules, you can’t access past the elements of the array you are pointing to the elements of. Notice that the rules don’t mention the type of the pointer, so a char * should still be pointing to the elements of x[0] even though its type is int [2]).

It is also defined earlier in the standard that any pointer is convertible to void *, but of course we can’t really do anything with these.

Is accessing outer multidimensional array using reference to inner array defined in C?

Tags:

c

language-lawyer

Emil

1 Answers

It is Undefined Behavior and the Compiler can Optimize Based on That

(Failed) Attempt to Get Around It

sayurc

Recent Activity

Donate For Us

Is accessing outer multidimensional array using reference to inner array defined in C?

Tags:

c

language-lawyer

Emil

1 Answers

It is Undefined Behavior and the Compiler can Optimize Based on That

(Failed) Attempt to Get Around It

sayurc

Related questions

Recent Activity

Donate For Us