Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access an array from the end in C?

I recently noticed that in C, there is an important difference between array and &array for the following declaration:

char array[] = {4, 8, 15, 16, 23, 42};

The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing a[b] is a syntactic sugar for *(a + b). Indeed, you could write 2[array] and it works perfectly according to the standard.

So we could take advantage of this information to write this:

char last_element = (&array)[1][-1];

&array has a size of 6 chars so (&array)[1]) is a pointer to chars located right after the array. By looking at [-1] I am therefore accessing the last element.

With this I could for example swap the entire array :

void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; }

int main() {
    char u[] = {1,2,3,4,5,6,7,8,9,10};

    for (int i = 0; i < sizeof(u) / 2; i++)
        swap(&u[i], &(&u)[1][-i - 1]);
}

Does this method for accessing an array by the end have flaws?

like image 801
nowox Avatar asked May 02 '20 13:05

nowox


People also ask

What is at the end of an array in C?

C arrays don't have an end marker. It is your responsibility as the programmer to keep track of the allocated size of the array to make sure you don't try to access element outside the allocated size. If you do access an element outside the allocated size, the result is undefined behaviour.

Can we address one element beyond the end of an array?

It does, however, allow a pointer to point at one element beyond the end of the array. The distinction is important. Thus, this is OK: char array[N]; char *p; char *end; for (p = array, end = array + N; p < end; ++p) do_something(p);

How do you put an element at the end of an array?

When you want to add an element to the end of your array, use push(). If you need to add an element to the beginning of your array, try unshift(). And you can add arrays together using concat().

What marks the end of an array?

A null or zero value marking the end of an array is the literal equivalent of the null char for an string.


2 Answers

The C standard does not define the behavior of (&array)[1].

Consider &array + 1. This is defined by the C standard, for two reasons:

  • When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.
  • When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, &array is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).

So &array + 1 is defined pointer arithmetic that points just beyond the end of array.

However, by definition of the subscript operator, (&array)[1] is *(&array + 1). While the &array + 1 is defined, applying * to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.”

Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with char *End = array + sizeof array / sizeof *array;. Then you can use End[-1] to refer to the last element, End[-2] to refer to the penultimate element, and so on.

like image 118
Eric Postpischil Avatar answered Oct 31 '22 22:10

Eric Postpischil


Although the Standard specifies that arrayLvalue[i] means (*((arrayLvalue)+(i))), which would be processed by taking the address of the first element of arrayLvalue, gcc sometimes treats [], when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of .member syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays.

struct foo {unsigned char x[12]};
int test1(struct foo *p1, struct foo *p2)
{
    p1->x[0] = 1;
    p2->x[1] = 2;
    return p1->x[0];
}
int test2(struct foo *p1, struct foo *p2)
{
    char *p;
    p1->x[0] = 1;
    (&p2->x[0])[1] = 2;
    return p1->x[0];
}

The code gcc generates for test1 will always return 1, while the generated code for test2 will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where p1 and p2 happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in test1() would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to p2->x[0] instead of p2->x[1].

like image 30
supercat Avatar answered Oct 31 '22 23:10

supercat