Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the rationale for one past the last element of an array object?

Tags:

c

c11

According to N1570 (C11 draft) 6.5.6/8 Additive operators:

Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object

Subclause 6.5.6/9 also contains:

Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.106)

This justifies pointer's arithmetic like this to be valid:

#include <stdio.h>

int main(void)
{
    int a[3] = {0, 1, 2};
    int *P, *Q;

    P = a + 3; // one past the last element
    Q = a + 2; // last element

    printf("%td\n", ((Q)+1)-(P));
    printf("%td\n", ((Q)-(P))+1);
    printf("%td\n", -((P)-((Q)+1)));

    return 0;
}

I would expect to disallow pointing to element of array out-of-bounds, for which dereference acts as undefined behaviour (array overrun), thus it makes it potentially dangerous. Is there any rationale for this?

like image 681
Grzegorz Szpetkowski Avatar asked Dec 14 '14 18:12

Grzegorz Szpetkowski


2 Answers

Specifying the range to loop over as the half-closed interval [start, end), especially for array indices, has certain pleasing properties as Dijkstra observed in one of his notes.

1) You can compute the size of the range as a simple function of end - start. In particular, if the range is specified in terms of array indices, the number of iterations performed by the loop would be given by end - start. If the range was [start, end], then the number of iterations would have been end - start + 1 - very annoying, isn't it? :)

2) Dijsktra's second observation applies only to the case of (non-negative) integral indices - specifying a range as [start, end) and (start, end] both have the property mentioned in 1). However, specifying it as (start, end] requires you to allow an index of -1 to represent a loop range including the index 0 - you are allowing an "unnatural" value of -1 just for the sake of representing the range. The [start, end) convention does not have this issue, because end is a non-negative integer, and hence a natural choice when dealing with array indices.

Dijsktra's objection to allowing -1 does have similarities to allowing one past the last valid address of the container. However, since the above convention has been in use for so long, it likely persuaded the standards committee to make this exception.

like image 115
Pradhan Avatar answered Nov 05 '22 09:11

Pradhan


The rationale is quite simple. The compiler is not allowed to place an array at the end of memory. To illustrate, assume that we have a 16-bit machine with 16-bit pointers. The low address is 0x0000. The high address is 0xffff. If you declare char array[256] and the compiler locates array at address 0xff00, then technically the array would fit into the memory, using addresses 0xff00 thru 0xffff inclusive. However, the expression

char *endptr = &array[256];   // endptr points one past the end of the array

would be equivalent to

char *endptr = NULL;          // &array[256] = 0xff00 + 0x0100 = 0x0000

Which means that the following loop would not work, since ptr will never be less than 0

for ( char *ptr = array; ptr < endptr; ptr++ )

So the sections you cited are simply lawyer-speak for, "Don't put arrays at the end of a memory region".


Historical note: the earliest x86 processors used a segmented memory scheme wherein memory addresses where specified by a 16-bit pointer register and a 16-bit segment register. The final address was computed by shifting the segment register left by 4 bits and adding to the pointer, e.g.

pointer register    1234
segment register   AB00
                   -----
address in memory  AC234

The resulting address space was 1MByte, but there were end-of-memory boundaries every 64Kbytes. That's one reason for using lawyer-speak instead of stating, "Don't put arrays at the end of memory" in plain english.

like image 40
user3386109 Avatar answered Nov 05 '22 10:11

user3386109