Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:

int array[5]; int *array_begin = &array[0]; int *array_end = &array[5]; 

Is &array[5] legal C++ code in this context?

I would like an answer with a reference to the Standard if possible.

It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?

like image 535
Zan Lynx Avatar asked Jun 12 '09 18:06

Zan Lynx


2 Answers

Yes, it's legal. From the C99 draft standard:

§6.5.2.1, paragraph 2:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

§6.5.3.2, paragraph 3 (emphasis mine):

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

§6.5.6, paragraph 8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.

like image 69
Adam Rosenfield Avatar answered Sep 22 '22 01:09

Adam Rosenfield


Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually dereference anything, it simply creates a pointer to one past the end of array.
  • &array[4] + 1 dereferences array+4 (which is perfectly safe), takes the address of that lvalue, and adds one to that address, which results in a one-past-the-end pointer (but that pointer never gets dereferenced.
  • &array[5] dereferences array+5 (which as far as I can see is legal, and results in "an unrelated object of the array’s element type", as the above said), and then takes the address of that element, which also seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

like image 36
jalf Avatar answered Sep 23 '22 01:09

jalf