An array is notably not a pointer. To be sure, both lvalues seem to contain the (1-dimensional) coordinate of some position in (1-dimensional) virtual memory. But consider this example.
#include <stdlib.h>
#include <stdio.h>
int main(){
char buffer0[4096];
char* buffer1 = malloc(4096);
printf("lvalue %16p sizeof %lu\n", (void *) buffer0, sizeof(buffer0));
printf("lvalue %16p sizeof %lu\n", (void *) buffer1, sizeof(buffer1));
// Example output: lvalue 0x7ffcb70e8620 sizeof 4096
// Example output: lvalue 0x7a4420 sizeof 8
}
Practical differences that come to mind are:
malloc()
must know how big a pointer is, to know how much to free()
given just the pointer...!)free()
them); pointers must be freed manually (if they own a non-trivial amount of memory, ie. through malloc()
)Overall, arrays seem to be much smarter (but less versatile) than pointers (they know how big they are, how many elements they have, and they have automatic memory management).
Questions
How do arrays "know" how big they are? How is this implemented?
Arrays don't know how big they are - there is no metadata associated with the array to indicate size (or type, or anything else). During translation, the compiler knows how big the array is, and anything that relies on that knowledge (pointer arithmetic, sizeof
operations, etc.) is handled at that time. Once machine code is generated, arrays are just dumb chunks of memory - there's no way to determine at runtime how big an array is by looking at the array object itself (with the exception of variably modified types like variable-length arrays, sizeof
operations are computed during translation, not runtime).
In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?
Arrays are nothing more than a contiguous sequence of objects of the same type. For the declaration
T arr[N]; // for any type T
you get
+---+
arr: | | arr[0]
+---+
| | arr[1]
+---+
| | arr[2]
+---+
...
+---+
| | arr[N-1]
+---+
There is no arr
object independent of the array elements themselves, nor is any metadata set aside anywhere for size, starting address, type, or anything else.
The subscript operation arr[i]
is defined as *(arr + i)
- given the starting address of the array, offset i
elements (not bytes!) from that address and dereference the result.
You are correct that arrays are not pointers - however, unless it is the operand of the sizeof
or unary &
operators, or is a string literal used to initialize a character array in a declaration, an expression of array type will be converted ("decay") to an expression of pointer type, and the value of the expression will be the address of the first element of the array (again, this is all done during translation, not at runtime).
Thus, when you write something like x = arr[i];
, the compiler will convert the expression arr
to a pointer value, so the subscript operation works.
By contrast, when you write ap = &arr;
, the compiler does not convert arr
to a pointer type. The result is still the same as the address of the first element, but the type is different - instead of T *
, the type is T (*)[N]
, or "pointer to N-element array of T
".
The type of an array contains its size (as a compile-time constant) and its member type. So since the compiler knows the type of all variables it can just calculate sizeof(the_array)
as sizeof(array_type.element_type) * array_type.element_count
.
In terms of memory allocation etc. they're simply treated like any other variable:
If you declare an automatic variable of an array type, that adds sizeof(the_array_type)
bytes to the size of the stack frame. So when the function is entered, the stack pointer is increased by enough to store the contents of the array, and when the function is exited, it is decreased by the same amount.
If you declare a variable with static duration, sizeof(the_array_type)
will be reserved in the binary's data segment.
Again, that's the same way all variables of any type are treated. The important thing is simply that an array contains its elements, so its size is the size of its contents, whereas a pointer merely points to its elements and its size is completely independent of what it points to.
When used as an r-expression outside of sizeof
, the name of an array is simply compiled to its address (and typed as a pointer).
Does the compiler do this, or does the kernel?
All of this is done by the compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With