It is explained elsewhere on stackoverflow (e.g. here, where unfortunately currently the accepted answer is incorrect---but at least the highest upvoted answer is correct) that the C standard provides that in almost all circumstances an array char my_array[50]
will be implicitly converted to a char *
when it is used, e.g. by passing to a function as do_something(my_array)
, given a declaration of void do_something(char *stuff) {}
. That is, the code
void do_something(char *my_array) {
// Do something
}
void do_something_2(char my_array[50]) {
// Do something
}
int main() {
char my_array[50];
do_something(my_array);
do_something_2(my_array);
return 0;
}
is compiled by gcc without any warnings on any strictness level.
However, paragraph 6.3.2.1.3 of C11 provides that this conversion does not occur specifically if one writes &my_array
, or sizeof(my_array)
(and moreover that these are the only times when this conversion does not occur). The purpose of the latter rule is obvious to me---the sizeof
an array being equal to the size of a pointer to the first element is very confusing, so should be prevented.
But the purpose of the first part of this rule (to do with writing &my_array
) entirely escapes me. See, the rule makes the type of &my_array
(in the notation of the C standard) char (*)[50]
, instead of char *
. When does this behaviour have any use at all? Indeed, except for sizeof
-purposes, why does the type char (*)[50]
exist at all?
For example, it is also explained on stackexchange (e.g. here) that any declared array argument to a function, such as char my_array[50]
in the definition of do_something_2
above, behaves in all ways exactly the same as if char *my_array
was written in the declaration instead, or even char my_array[0]
or char my_array[5]
! Even worse, it means that writing do_something(my_array)
compiles without any type errors in any of these circumstances, while do_something(&my_array)
(i.e. passing an array type of the correct size to a function declared to accept precisely that array type) is an error!
In summary, does the "&
-part" of C11 6.3.2.1.3 have any purpose at all? If so, what is it?
(The only reason I could think of is in order to make sizeof(&my_array)
evaluate to the same thing as sizeof(my_array)
, but this does not even happen due to other C standard rules!---the former sizeof(&my_array)
construction "as expected" indeed reports the size of a pointer, and not the array itself. See here.)
Indeed, except for sizeof-purposes, why does the type char
(*)[50]
exist at all?
Given char x[100][50]
, the automatic conversion of x
to a pointer produces a pointer to its first element. Its first element is a char [50]
, so a pointer to that is char (*)[50]
. So this is the type that the conversion must produce.
When we pass some two-dimensional array, say int x[100][50]
, to an array with a parameter declared int x[100][50]
, that parameter will be automatically adjusted to int (*x)[50]
. Then the function will access elements using a notation such as x[i][j]
. If x
had been adjusted to some other type, this would not work—we need x
to be a pointer to char [50]
so that x[i]
correctly calculates in elements of 50-element subarrays and so that it produces such a subarray as its result, which can then be used with [j]
.
Sometimes we might want the function to operate only on some portion in the middle of x
. To do that, we would pass it the starting address of that portion. For example, we might pass it &x[n]
, to start at the nth row of the array. As before, the adjusted function parameter is char (*)[50]
, so we need &x[n]
to give us the address of the subarray that is x[n]
with type char (*)[50]
. Passing a char *
would not be the correct type for the parameter.
The &
operator isn't the exception - it's the "decay" rule of array expressions that is the exception. No other aggregate type (struct
or union
) "decays" to a pointer1. It's the array type that's weird, not the operator.
For every lvalue expression x
of type T
, &x
yields a value of type T *
(pointer to T
). Period, no exceptions. If x
has type int
, then &x
has type int *
. If x
has type double
, then &x
has type double *
. If x
has type int [10]
, then &x
has type int (*)[10]
. The semantics are exactly the same in all cases.
The decay rule exists because dmr wanted to keep the array semantics from B (a precursor to C), but he didn't want to store the explicit pointer those semantics required2. So instead of storing the pointer, he came up with the "decay" rule - when the compiler sees an array expression that isn't the operand of the sizeof
or unary &
operators, it converts that expression from type "N-element array of T
" to "pointer to T
" and the value of the expression is the address of the first element.
This allowed C to keep B's array indexing semantics where a[i]
is defined as *(a + i)
- given a starting address a
, offset i
elements (not bytes! - this will be important later) from that address and dereference the result. The tradeoff is that array expressions in C lose their array-ness most of the time.
why does the type
char (*)[50]
exist at all?
First of all, let's see how that decay rule applies to a 2D array. Imagine an array declaration
A a[N][M];
Remember the rule "the expression a
is converted from N-element array of T
to pointer to T
" - in this case, T
is "M
-element array of A
", so the expression a
decays from "N-element array of M-element array of A
" to "pointer to M-element array of A
", or A (*)[M]
. So pointer to array types fall naturally out of the decay rule anyway.
Secondly, remember how pointer arithmetic works - if p
stores the address of an object of type T
, then p + 1
yields the address of the next object, not necessarily the next byte. Again, the array indexing operation a[i]
is defined as *(a + i)
- a
is the address of the first element of the array, a + 1
is the address of the second element, a + 2
is the address of the third element, etc.
So if a
yields the address of an M-element array of A
, then a + 1
yields the address of the next M-element array of A
.
This is exactly how multi-dimensional array indexing works. If we have
char arr[2][2];
then we have
char char * char (*)[2]
+---+
arr:| | arr[0][0] arr[0] arr
+ - +
| | arr[0][1] arr[0] + 1
+---+
| | arr[1][0] arr[1] arr + 1
+ - +
| | arr[1][1] arr[1] + 1
+---+
The expression arr[i][j]
is equal to *(arr[i] + j)
, which is equal to *(*(arr + i) + j)
. arr + i
yields the address of the i
th 2-element array of char
, *(arr + i) + j
yields the address of the j
'th element of that 2-element array.
We can also use pointer to array types for dynamic allocation. Remember that the common idiom is
T *p = malloc( N * sizeof *p );
This allocates space for N elements of T
and assigns the address of that space to p
. If I change T
to an array type A [M]
, I get
A (*p)[M] = malloc( N * sizeof *p );
The semantics are exactly the same, all that's changed is the type - I'm allocating space for N elements of type A [M]
- IOW, an array of type A [N][M]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With