Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the ramifications of passing & assigning arrays as pointers in C++?

As background, I gave an answer to this post a little while ago:

Return array in a function

And it unintentionally kicked off a really long comment chain about pointers vs. arrays in C++ because I tried to oversimplify and I made the statement "arrays are pointers". Though my final answer sounds pretty decent, it was only after some heavy editing in response to a lot of the comments I got.

This question is not meant to be troll bait, I understand that a pointer and an array are not the same thing, but some of the available syntax in the C++ language certainly makes them behave very similarly in a lot of cases. (FYI, my compiler is i686-apple-darwin9-g++-4.0.1 on OS X 10.5.8)

For instance, this code compiles and runs just fine for me (I realize x[8] is a potential segmentation fault):

  //this is just a simple pointer                                                                                                                                                            
  int *x = new int;
  cout << x << " " << (*x) << " " << x[8] << endl; //might segfault                                                                                                                          

  //this is a dynamic array                                                                                                                                                                  
  int* y = new int[10];
  cout << y << " " << (*y) << " " << y[8] << endl;

  //this is a static array                                                                                                                                                                   
  int z[10];
  cout << z << " " << (*z) << " " << z[8] << endl;

That particular snippet makes it look like pointers and arrays can be used almost identically, but if I add this to the bottom of that code, the last two lines won't compile:

  x = y;
  x = z;
  y = x;
  y = z;
  //z = x; //won't compile
  //z = y; //won't compile

So clearly the compiler at least understands that z and x are different things, but I can interchange x and y just fine.

This is further confusing when you look at passing arrays to functions and returning arrays from functions. Consider this example (again, I am aware of the potential segmentation faults here when passing x):

void foo(int in[])
{
  cout << in[8] << endl;                                                                                                                                                                                      
}

void bar(int* in)
{
  cout << in[8] << endl;                                                                                                                                                                     
}

int main()
{
  //this is just a simple pointer                                                                                                                                                            
  int *x = new int;
  foo(x);
  bar(x);

  //this is a dynamic array                                                                                                                                                                  
  int* y = new int[10];
  foo(y);
  bar(y);

  //this is a static array                                                                                                                                                                   
  int z[10];
  foo(z);
  bar(z);
}

All this code properly compiles and runs on my machine.

I feel like I have a decent internal understanding of what's going on here, but if you asked me to articulate exactly what's happening, I don't feel like I could satisfactorily explain. So here's what I'm getting at:

  • When I pass an array to a function as int* in instead of int in[], what am I gaining or losing? Is the same true when returning an array as int*? Are there ever bad side effects from doing this?

  • If I asked you what the data type of y is, would you say pointer to int, array of ints or something else?

  • Similarly, what happens when I say x = y vs. x = z? I'm still able to use x[] and access the things that were originally in y or z, but is this really just because pointer arithmetic happens to land me in memory space that is still valid?

I've dug through all the similar array/pointer questions on SO and I'm having trouble finding the definitive explanation that clears this up for me once and for all.

like image 952
Brent Writes Code Avatar asked Aug 18 '10 18:08

Brent Writes Code


3 Answers

C++ is statically typed, so of course the compiler understands that x and z are not the same kind of thing. They have different types - z is an array, x and y are pointers.

The reason z = x doesn't compile isn't (just) that the types are incompatible, though, it's that you can't assign to an array variable at all. Ever. x = z assigns to x, a pointer to the first element of z. x = y assigns the value of y to x.[*]

When I pass an array to a function as int* in instead of int in[], what am I gaining or losing?

They do exactly the same thing, so you have no choice to make. Possibly you have been misled by the fact that C++ syntax permits int in[] as a function parameter. The type of the parameter in is not any kind of array, it is int*.

If I asked you what the data type of y is

It's int*. That's what it's declared as, so that's what it is.

The value that it holds is a pointer to (the first element of) an array. I frequently use that formula: "pointer to (the first element of)" in cases where I'd like to say "pointer to array", but can't because there's the potential for ambiguity as to whether the type involved is pointer-to-array, or not.

However, pointers-to-arrays are rarely used in C++, because the size of the array is part of the type. There's no such type as "pointer to an array of int" in C++, just "pointer to array of 1 int", "pointer to array of 2 int", etc. This usually isn't very convenient, hence the use of a pointer to the first element of an array whose size may not be known at compile time.

is this really just because pointer arithmetic happens to land me in memory space that is still valid

Pretty much, yes. The size of the array is part of the type of z, but is not part of the type of x or y, and also is not part of the type of the result of z decaying to a pointer to its first element. So y could be a pointer to the first of 10 elements, or just to 1 element. You only know the difference by context, and by requiring of your callers that the value you have points to what it's supposed to point to.

"Happens" is leaving too much to chance, though - part of your job when using arrays is to make sure you don't stray beyond their bounds.

[*] z = x isn't allowed, even after you've done x = z, because z is (and always will be) an particular array of 10 ints in memory. Back when C was designed, there was a question of whether array variables could in principle be "reseatable", meaning that you could do:

int z[10];
int y[10];
z = y; // z is now an alias for y
y[0] = 3;
// z[0] now has the value 3

Dennis Ritchie decided not to allow this, because it would prevent him from distinguishing arrays from pointers in a way that he needed to do. So z cannot ever refer to a different array from the one it was declared as. Read all about it here: http://cm.bell-labs.com/cm/cs/who/dmr/chist.html, under "Embryonic C".

Another plausible meaning for z = y could be memcpy(z,y,sizeof(z)). It wasn't given that meaning either.

like image 188
Steve Jessop Avatar answered Nov 08 '22 02:11

Steve Jessop


The fundamental difference between a pointer and an array is that the pointer has a unique memory address that holds the address of the array data.

An array name, though treated as a pointer based on context, does not itself have a memory location whose address you can take. When it is treated as a pointer, its value is generated at runtime as the address of its first element.

That is why you can assign its value to another pointer but not vice versa. There is no pointer memory location to treat as an l-value.

like image 30
Amardeep AC9MF Avatar answered Nov 08 '22 02:11

Amardeep AC9MF


Arrays are not pointers, but arrays easily decay to pointers to their first element. Additionally, C (and thus C++) allow array access syntax to be used for pointers.

When I pass an array to a function as int* in instead of int in[], what am I gaining or losing? Is the same true when returning an array as int*? Are there ever bad side effects from doing this?

You're gaining nothing, because int[] is just another way to write int*. If you want to pass an array, you have to pass it per reference, exactly matching its size. Non-type template arguments can ease the problem with the exact size:

template< std:::size_t N >
void f(int (&arr)[N])
{
   ...
}

If I asked you what the data type of y is, would you say pointer to int, array of ints or something else?

It's a pointer to the first element of a dynamically allocated array.

Similarly, what happens when I say x = y vs. x = z?

You assign the addresses of different objects of different types to the same pointer. (And you leak an int on the heap. :))

I'm still able to use x[] and access the things that were originally in y or z, but is this really just because pointer arithmetic happens to land me in memory space that is still valid?

Yep. As I said, pointers conveniently and confusingly allow array syntax to be applied to them. However, that still doesn't make a pointer an array.

like image 5
sbi Avatar answered Nov 08 '22 01:11

sbi