I recently noticed that in C, there is an important difference between <code>array</code> and <code>&array</code> for the following declaration: <pre class="prettyprint"><code>char array[] = {4, 8, 15, 16, 23, 42}; </code></pre> The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing <code>a[b]</code> is a syntactic sugar for <code>*(a + b)</code>. Indeed, you could write <code>2[array]</code> and it works perfectly according to the standard. So we could take advantage of this information to write this: <pre class="prettyprint"><code>char last_element = (&array)[1][-1]; </code></pre> <code>&array</code> has a size of 6 chars so <code>(&array)[1])</code> is a pointer to chars located right after the array. By looking at <code>[-1]</code> I am therefore accessing the last element. With this I could for example swap the entire array : <pre class="prettyprint"><code>void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; } int main() { char u[] = {1,2,3,4,5,6,7,8,9,10}; for (int i = 0; i < sizeof(u) / 2; i++) swap(&u[i], &(&u)[1][-i - 1]); } </code></pre> Does this method for accessing an array by the end have flaws?

The C standard does not define the behavior of <code>(&array)[1]</code>. Consider <code>&array + 1</code>. This is defined by the C standard, for two reasons: <ul> <li>When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.</li> <li>When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, <code>&array</code> is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).</li> </ul> So <code>&array + 1</code> is defined pointer arithmetic that points just beyond the end of <code>array</code>. However, by definition of the subscript operator, <code>(&array)[1]</code> is <code>*(&array + 1)</code>. While the <code>&array + 1</code> is defined, applying <code>*</code> to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary <code>*</code> operator that is evaluated.” Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with <code>char *End = array + sizeof array / sizeof *array;</code>. Then you can use <code>End[-1]</code> to refer to the last element, <code>End[-2]</code> to refer to the penultimate element, and so on.

Although the Standard specifies that arrayLvalue[i] means <code>(*((arrayLvalue)+(i)))</code>, which would be processed by taking the address of the first element of <code>arrayLvalue</code>, gcc sometimes treats <code>[]</code>, when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of <code>.member</code> syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays. <pre class="prettyprint"><code>struct foo {unsigned char x[12]}; int test1(struct foo *p1, struct foo *p2) { p1->x[0] = 1; p2->x[1] = 2; return p1->x[0]; } int test2(struct foo *p1, struct foo *p2) { char *p; p1->x[0] = 1; (&p2->x[0])[1] = 2; return p1->x[0]; } </code></pre> The code gcc generates for <code>test1</code> will always return 1, while the generated code for <code>test2</code> will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where <code>p1</code> and <code>p2</code> happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in <code>test1()</code> would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to <code>p2->x[0]</code> instead of <code>p2->x[1]</code>.

Access an array from the end in C?

Q: How do you put an element at the end of an array?

When you want to add an element to the end of your array, use push(). If you need to add an element to the beginning of your array, try unshift(). And you can add arrays together using concat().

Q: What marks the end of an array?

A null or zero value marking the end of an array is the literal equivalent of the null char for an string.

Tags:

arrays

c

pointers

language-lawyer

I recently noticed that in C, there is an important difference between array and &array for the following declaration:

char array[] = {4, 8, 15, 16, 23, 42};

The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing a[b] is a syntactic sugar for *(a + b). Indeed, you could write 2[array] and it works perfectly according to the standard.

So we could take advantage of this information to write this:

char last_element = (&array)[1][-1];

&array has a size of 6 chars so (&array)[1]) is a pointer to chars located right after the array. By looking at [-1] I am therefore accessing the last element.

With this I could for example swap the entire array :

void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; }

int main() {
    char u[] = {1,2,3,4,5,6,7,8,9,10};

    for (int i = 0; i < sizeof(u) / 2; i++)
        swap(&u[i], &(&u)[1][-i - 1]);
}

Does this method for accessing an array by the end have flaws?

801

asked May 02 '20 13:05

nowox

2 Answers

The C standard does not define the behavior of (&array)[1].

Consider &array + 1. This is defined by the C standard, for two reasons:

When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.
When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, &array is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).

So &array + 1 is defined pointer arithmetic that points just beyond the end of array.

However, by definition of the subscript operator, (&array)[1] is *(&array + 1). While the &array + 1 is defined, applying * to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.”

Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with char *End = array + sizeof array / sizeof *array;. Then you can use End[-1] to refer to the last element, End[-2] to refer to the penultimate element, and so on.

118

answered Oct 31 '22 22:10

Eric Postpischil

Although the Standard specifies that arrayLvalue[i] means (*((arrayLvalue)+(i))), which would be processed by taking the address of the first element of arrayLvalue, gcc sometimes treats [], when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of .member syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays.

struct foo {unsigned char x[12]};
int test1(struct foo *p1, struct foo *p2)
{
    p1->x[0] = 1;
    p2->x[1] = 2;
    return p1->x[0];
}
int test2(struct foo *p1, struct foo *p2)
{
    char *p;
    p1->x[0] = 1;
    (&p2->x[0])[1] = 2;
    return p1->x[0];
}

The code gcc generates for test1 will always return 1, while the generated code for test2 will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where p1 and p2 happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in test1() would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to p2->x[0] instead of p2->x[1].

answered Oct 31 '22 23:10

supercat

Related questions
                            
                                Assign to array in struct in c
                            
                                Why is flattening a multidimensional array in C illegal? [duplicate]
                            
                                Cryptic struct definition in C
                            
                                how to turn on icc/icpc warnings?
                            
                                In x86, why do I have the same instruction two times, with reversed operands?
                            
                                Any Faster RMS Value Calculation in C?
                            
                                How to pass an array of Swift strings to a C function taking a char ** parameter
                            
                                HOST_NAME_MAX undefined after include <limits.h>
                            
                                Assembler debug of undefined expression
                            
                                What is "allocation context"?
                            
                                Compiler optimizations and temporary assignments in C and C++
                            
                                What exactly does the C Structure Dot Operator Do (Lower Level Perspective)?
                            
                                What is the purpose of 61 in tm_sec field from the tm structure
                            
                                Passing a Rust variable to a C function that expects to be able to modify it
                            
                                Why is this nested macro replacement failing?
                            
                                Why does popen() invoke a shell to execute a process?
                            
                                C function call without bracket
                            
                                Handle C typedef on different platform using NativeCall
                            
                                Exactly what cases does the gcc execstack flag allow and how does it enforce it?
                            
                                Macro replacement list rescanning for replacement

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With