Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do C and C++ compilers allow array lengths in function signatures when they're never enforced?

Tags:

c++

arrays

c

It is a quirk of the syntax for passing arrays to functions.

Actually it is not possible to pass an array in C. If you write syntax that looks like it should pass the array, what actually happens is that a pointer to the first element of the array is passed instead.

Since the pointer does not include any length information, the contents of your [] in the function formal parameter list are actually ignored.

The decision to allow this syntax was made in the 1970s and has caused much confusion ever since...


The length of the first dimension is ignored, but the length of additional dimensions are necessary to allow the compiler to compute offsets correctly. In the following example, the foo function is passed a pointer to a two-dimensional array.

#include <stdio.h>

void foo(int args[10][20])
{
    printf("%zd\n", sizeof(args[0]));
}

int main(int argc, char **argv)
{
    int a[2][20];
    foo(a);
    return 0;
}

The size of the first dimension [10] is ignored; the compiler will not prevent you from indexing off the end (notice that the formal wants 10 elements, but the actual provides only 2). However, the size of the second dimension [20] is used to determine the stride of each row, and here, the formal must match the actual. Again, the compiler will not prevent you from indexing off the end of the second dimension either.

The byte offset from the base of the array to an element args[row][col] is determined by:

sizeof(int)*(col + 20*row)

Note that if col >= 20, then you will actually index into a subsequent row (or off the end of the entire array).

sizeof(args[0]), returns 80 on my machine where sizeof(int) == 4. However, if I attempt to take sizeof(args), I get the following compiler warning:

foo.c:5:27: warning: sizeof on array function parameter will return size of 'int (*)[20]' instead of 'int [10][20]' [-Wsizeof-array-argument]
    printf("%zd\n", sizeof(args));
                          ^
foo.c:3:14: note: declared here
void foo(int args[10][20])
             ^
1 warning generated.

Here, the compiler is warning that it is only going to give the size of the pointer into which the array has decayed instead of the size of the array itself.


The problem and how to overcome it in C++

The problem has been explained extensively by pat and Matt. The compiler is basically ignoring the first dimension of the array's size effectively ignoring the size of the passed argument.

In C++, on the other hand, you can easily overcome this limitation in two ways:

  • using references
  • using std::array (since C++11)

References

If your function is only trying to read or modify an existing array (not copying it) you can easily use references.

For example, let's assume you want to have a function that resets an array of ten ints setting every element to 0. You can easily do that by using the following function signature:

void reset(int (&array)[10]) { ... }

Not only this will work just fine, but it will also enforce the dimension of the array.

You can also make use of templates to make the above code generic:

template<class Type, std::size_t N>
void reset(Type (&array)[N]) { ... }

And finally you can take advantage of const correctness. Let's consider a function that prints an array of 10 elements:

void show(const int (&array)[10]) { ... }

By applying the const qualifier we are preventing possible modifications.


The standard library class for arrays

If you consider the above syntax both ugly and unnecessary, as I do, we can throw it in the can and use std::array instead (since C++11).

Here's the refactored code:

void reset(std::array<int, 10>& array) { ... }
void show(std::array<int, 10> const& array) { ... }

Isn't it wonderful? Not to mention that the generic code trick I've taught you earlier, still works:

template<class Type, std::size_t N>
void reset(std::array<Type, N>& array) { ... }

template<class Type, std::size_t N>
void show(const std::array<Type, N>& array) { ... }

Not only that, but you get copy and move semantic for free. :)

void copy(std::array<Type, N> array) {
    // a copy of the original passed array 
    // is made and can be dealt with indipendently
    // from the original
}

So, what are you waiting for? Go use std::array.


It's a fun feature of C that allows you to effectively shoot yourself in the foot if you're so inclined.

I think the reason is that C is just a step above assembly language. Size checking and similar safety features have been removed to allow for peak performance, which isn't a bad thing if the programmer is being very diligent.

Also, assigning a size to the function argument has the advantage that when the function is used by another programmer, there's a chance they'll notice a size restriction. Just using a pointer doesn't convey that information to the next programmer.


First, C never checks array bounds. Doesn't matter if they are local, global, static, parameters, whatever. Checking array bounds means more processing, and C is supposed to be very efficient, so array bounds checking is done by the programmer when needed.

Second, there is a trick that makes it possible to pass-by-value an array to a function. It is also possible to return-by-value an array from a function. You just need to create a new data type using struct. For example:

typedef struct {
  int a[10];
} myarray_t;

myarray_t my_function(myarray_t foo) {

  myarray_t bar;

  ...

  return bar;

}

You have to access the elements like this: foo.a[1]. The extra ".a" might look weird, but this trick adds great functionality to the C language.


To tell the compiler that myArray points to an array of at least 10 ints:

void bar(int myArray[static 10])

A good compiler should give you a warning if you access myArray [10]. Without the "static" keyword, the 10 would mean nothing at all.


This is a well-known "feature" of C, passed over to C++ because C++ is supposed to correctly compile C code.

Problem arises from several aspects:

  1. An array name is supposed to be completely equivalent to a pointer.
  2. C is supposed to be fast, originally developerd to be a kind of "high-level Assembler" (especially designed to write the first "portable Operating System": Unix), so it is not supposed to insert "hidden" code; runtime range checking is thus "forbidden".
  3. Machine code generrated to access a static array or a dynamic one (either in the stack or allocated) is actually different.
  4. Since the called function cannot know the "kind" of array passed as argument everything is supposed to be a pointer and treated as such.

You could say arrays are not really supported in C (this is not really true, as I was saying before, but it is a good approximation); an array is really treated as a pointer to a block of data and accessed using pointer arithmetic. Since C does NOT have any form of RTTI You have to declare the size of the array element in the function prototype (to support pointer arithmetic). This is even "more true" for multidimensional arrays.

Anyway all above is not really true anymore :p

Most modern C/C++ compilers do support bounds checking, but standards require it to be off by default (for backward compatibility). Reasonably recent versions of gcc, for example, do compile-time range checking with "-O3 -Wall -Wextra" and full run-time bounds checking with "-fbounds-checking".