Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are arrays "implemented" in C?

An array is notably not a pointer. To be sure, both lvalues seem to contain the (1-dimensional) coordinate of some position in (1-dimensional) virtual memory. But consider this example.

#include <stdlib.h>
#include <stdio.h>
int main(){
  char buffer0[4096];
  char* buffer1 = malloc(4096);
  printf("lvalue %16p  sizeof %lu\n", (void *) buffer0, sizeof(buffer0));
  printf("lvalue %16p  sizeof %lu\n", (void *) buffer1, sizeof(buffer1));
// Example output:  lvalue   0x7ffcb70e8620  sizeof 4096
// Example output:  lvalue         0x7a4420  sizeof 8
}

Practical differences that come to mind are:

  1. Arrays know how big (in bytes) they are (and, by extension, they know how many elements they have); pointers don't (but malloc() must know how big a pointer is, to know how much to free() given just the pointer...!)
  2. Arrays are "garbage collected" (no need to free() them); pointers must be freed manually (if they own a non-trivial amount of memory, ie. through malloc())
  3. Arrays "live" in the stack (high virtual memory addresses, at least on my platform); pointers "live" in the heap (low virtual memory addresses)
  4. Arrays decay to pointers when passed to functions
  5. Arrays cannot be resized; pointers can

Overall, arrays seem to be much smarter (but less versatile) than pointers (they know how big they are, how many elements they have, and they have automatic memory management).


Questions

  1. How do arrays "know" how big they are? How is this implemented?
  2. In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?
like image 683
étale-cohomology Avatar asked Dec 03 '22 11:12

étale-cohomology


2 Answers

How do arrays "know" how big they are? How is this implemented?

Arrays don't know how big they are - there is no metadata associated with the array to indicate size (or type, or anything else). During translation, the compiler knows how big the array is, and anything that relies on that knowledge (pointer arithmetic, sizeof operations, etc.) is handled at that time. Once machine code is generated, arrays are just dumb chunks of memory - there's no way to determine at runtime how big an array is by looking at the array object itself (with the exception of variably modified types like variable-length arrays, sizeof operations are computed during translation, not runtime).

In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?

Arrays are nothing more than a contiguous sequence of objects of the same type. For the declaration

T arr[N]; // for any type T

you get

     +---+
arr: |   | arr[0]
     +---+
     |   | arr[1]
     +---+
     |   | arr[2]
     +---+
      ...
     +---+ 
     |   | arr[N-1]
     +---+

There is no arr object independent of the array elements themselves, nor is any metadata set aside anywhere for size, starting address, type, or anything else.

The subscript operation arr[i] is defined as *(arr + i) - given the starting address of the array, offset i elements (not bytes!) from that address and dereference the result.

You are correct that arrays are not pointers - however, unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of array type will be converted ("decay") to an expression of pointer type, and the value of the expression will be the address of the first element of the array (again, this is all done during translation, not at runtime).

Thus, when you write something like x = arr[i];, the compiler will convert the expression arr to a pointer value, so the subscript operation works.

By contrast, when you write ap = &arr;, the compiler does not convert arr to a pointer type. The result is still the same as the address of the first element, but the type is different - instead of T *, the type is T (*)[N], or "pointer to N-element array of T".

like image 171
John Bode Avatar answered Dec 12 '22 16:12

John Bode


  1. The type of an array contains its size (as a compile-time constant) and its member type. So since the compiler knows the type of all variables it can just calculate sizeof(the_array) as sizeof(array_type.element_type) * array_type.element_count.

  2. In terms of memory allocation etc. they're simply treated like any other variable:

    If you declare an automatic variable of an array type, that adds sizeof(the_array_type) bytes to the size of the stack frame. So when the function is entered, the stack pointer is increased by enough to store the contents of the array, and when the function is exited, it is decreased by the same amount.

    If you declare a variable with static duration, sizeof(the_array_type) will be reserved in the binary's data segment.

    Again, that's the same way all variables of any type are treated. The important thing is simply that an array contains its elements, so its size is the size of its contents, whereas a pointer merely points to its elements and its size is completely independent of what it points to.

    When used as an r-expression outside of sizeof, the name of an array is simply compiled to its address (and typed as a pointer).

    Does the compiler do this, or does the kernel?

    All of this is done by the compiler.

like image 38
sepp2k Avatar answered Dec 12 '22 17:12

sepp2k