In C, if we have an array like a[10]
, then a
and &a
have the same pointer value (but not the same type). I want to know why was C designed like this?
Was this to save the additional space required for storing &a
? ... This makes sense when you think of the fact that a
can never point to any other location, so storing &a
is meaningless.
the fact that
a
can never point to any other location
This isn't a fact, though. If a
is an array, a
doesn't point anywhere because a
is not a pointer. Given int a[42];
, a
names an array of 42 int
objects; it is not a pointer to an array of 42 int
objects (that would be int (*a)[42];
).
&x
gives you the address of the object x
; if x
is an array type variable, then &x
gives you the address of the array; if nothing else, this is consistent with the behavior of &
for any other object.
A better question would be "why does an array (like a
) decay to a pointer to its initial element in most cases when it is used?" While I don't know with certainty why the language was designed this way, it does make the specification of many things much simpler, notably, arithmetic with an array is effectively the same as arithmetic with a pointer.
The design is quite elegant and pretty much necessary when you consider how referring to an array works at the assembly level. Using x86 assembly, consider the following C code:
void f(int array[]) { return; }
void g(int (*array)[]) { return; }
int main()
{
int a[5];
f(a);
g(&a);
return 0;
}
The array a
will take up 20 bytes on the stack since an int typically takes up 4 bytes on most platforms. With the register EBP
pointing at the base of the stack's activation record, you would be looking at the following assembly for the main()
function above:
//subtract 20 bytes from the stack pointer register ESP for the array
sub esp, 20
//the array is now allocated on the stack
//get the address of the start of the array, and move it into EAX register
lea eax, [ebp - 20]
//push the address contained in EAX onto the stack for the call to f()
//this is pretty much the only way that f() can refer to the array allocated
//in the stack for main()
push eax
call f
//clean-up the stack
pop eax
//get a pointer to the array of int's on the stack
//(so the type is "int (*)[]")
lea eax, [ebp - 20]
//make the function call again using the stack for the function parameters
push eax
call g
//...clean up the stack and return
The assembly command LEA
, or "Load Effective Address", calculates the address from the expression of its second operand and moves it into the register designated by the first operand. So every time we're calling that command, it's like the C-equivalent of the address-of operator. You'll notice that the address where the array starts (i.e., [ebp - 20]
, or 20 bytes subtracted from the base of the stack pointer address located in the reigister EBP
) is what is always passed to each of the functions f
and g
. That's pretty-much the only way it can be done at the machine-code level in order to refer to one chunk of memory allocated in the stack of one function in another function without having to actually copy the contents of the array.
The take-away is that arrays are not the same as pointers, but at the same time, the only effective way to refer to an array on the right-hand side of the assignment operator, or in passing it to a function is to pass it around by reference, which means referring to the array by-name is really, at the machine-level, the exact same as getting a pointer to the array. Therefore at the machine-code level, a
, &a
, and even &a[0]
in these situations devolve into the same set of instructions (in this example case lea eax, [ebp - 20]
. But again, an array-type is not a pointer, and a
, and &a
are not the same type. But since it designates a chunk of memory, the easiest and most effective way to get a reference to it is through a pointer.
In fact, a[0]
is actually the same memory location that a
. &a
represents the adress where a
is stored.
It's different ways to represent the same notation.
Going to the index 3 of the array (a[2]
) is the same as doing a + sizeof( typeof(a) ) * 3
where typeof(a)
is the type of the variable.
Your explanation is on the right track, although I doubt if the amount of space was the issue, but rather the special case of needing to allocate it at all. Normally, every object that C deals with has a value (or values) and an address. So, an actually allocated pointer has itself an address already and it makes sense to have both value and address available, for real pointers.
But an array reference already is an address. For C to make a double-indirect pointer via the & operator would have required allocating space somwhere and this would have represented a huge divergence in philosophy for the simple early dmr C compiler.
Where it would have stored this new pointer is a good question. With the same storage class as the array? What if it was a parameter? It's Pandora's box and the easiest way to resolve it is to define away the operation. If the developer wants an indirect pointer he can always declare one.
Plus, it makes sense for &
to return the address of an array object, because that's consistent with its use elsewhere.
A good way to look at this is to see that objects have values and addresses and the array reference is just a shorthand syntax. Actually requiring &a
would have been a bit pedantic because the reference a
wouldn't have had another interpretation anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With