I remember an example where the difference between pointers and arrays was demonstrated. An array decays to a pointer to the first element in an array when passed as a function parameter, but they are not equivalent, as demonstrated next: <pre class="prettyprint lang-c prettyprint-override"><code>//file file1.c int a[2] = {800, 801}; int b[2] = {100, 101}; </code></pre> <pre class="prettyprint lang-c prettyprint-override"><code>//file file2.c extern int a[2]; // here b is declared as pointer, // although the external unit defines it as an array extern int *b; int main() { int x1, x2; x1 = a[1]; // ok x2 = b[1]; // crash at runtime return 0; } </code></pre> The linker does not type-check for external variables so no errors are generated at compile time. The problem is that <code>b</code> is in fact an array, but the compilation unit <code>file2</code> is unaware of that and treats <code>b</code> as a pointer, resulting in a crash when trying to dereference it. I remember when this was explained it made perfect sense, but now I can't remember the explanation nor I can come up to it on my own. So I guess the question is how is an array treated differently than a pointer when accessing elements? (because I thought that <code>p[1]</code> is converted to (the assembly equivalent of) <code>*(p + 1)</code> regardless if <code>p</code> is an array or a pointer — I am obviously wrong). <hr> The assembly generated by the two dereferences (VS 2013): note: <code>1158000h</code> and <code>1158008h</code> are the memory addresses of <code>a</code> and <code>b</code> respectively <pre class="prettyprint"><code> 12: x1 = a[1]; 0115139E mov eax,4 011513A3 shl eax,0 011513A6 mov ecx,dword ptr [eax+1158000h] 011513AC mov dword ptr [x1],ecx 13: x2 = b[1]; 011513AF mov eax,4 011513B4 shl eax,0 011513B7 mov ecx,dword ptr ds:[1158008h] 011513BD mov edx,dword ptr [ecx+eax] 011513C0 mov dword ptr [x2],edx </code></pre>

Thanks to the link provided by @tesseract in the comments: Expert C Programming: Deep C Secrets (page 96), I came up with a simple answer (a simple dumb down version of the explanation in the book; for a full academic answer read the book): <ul> <li>when declared <code>int a[2]</code>: <ul> <li>the compiler has for <code>a</code> an address where this variable is stored. This address is also the address of the array since the type of the variable is array.</li> <li>Accessing <code>a[1]</code> means: <ul> <li>retrieving that address,</li> <li>adding the offset and</li> <li>accessing the memory at this computed new address.</li> </ul> </li> </ul> </li> <li>when declared <code>int *b</code>: <ul> <li>the compiler also has an address for <code>b</code> but this is the address of the pointer variable, not the array.</li> <li>So accessing <code>b[1]</code> means: <ul> <li>retrieving that address,</li> <li>accessing that location to get the value of <code>b</code>, i.e. the address of the array</li> <li>adding an offset to this address and then</li> <li>accessing the final memory location.</li> </ul> </li> </ul> </li> </ul>

<pre class="prettyprint"><code>// in file2.c extern int *b; // b is declared as a pointer to an integer // in file1.c int b[2] = {100, 101}; // b is defined and initialized as an array of 2 ints </code></pre> The linker links them both to same memory address, however since the symbol <code>b</code> has different types in <code>file1.c</code> and <code>file2.c</code>, the same memory location is interpreted differently. <pre class="prettyprint"><code>// in file2.c int x2; // assuming sizeof(int) == 4 x2 = b[1]; // b[1] == *(b+1) == *(100 + 1) == *(104) --> segfault </code></pre> <code>b[1]</code> is evaluated first as <code>*(b+1)</code>. This means get the value at the memory location <code>b</code> is bound to, add <code>1</code> to it (pointer arithmetic) to get a new address, load that value into the CPU register, store that value at the location <code>x2</code> is bound to. So, the value at the location <code>b</code> is bound to is <code>100</code>, add <code>1</code> to it to get <code>104</code> (pointer arithmetic; <code>sizeof *b</code> is 4) and get the value at the address <code>104</code>! This is wrong and undefined behaviour and most likely will cause program crash. There is a difference in how the elements of an array are accessed and how the values pointed to by a pointer are accessed. Let's take an example. <pre class="prettyprint"><code>int a[] = {100, 800}; int *b = a; </code></pre> <code>a</code> is an array of <code>2</code> integers and <code>b</code> is a pointer to an integer initialized to the address of the first element of <code>a</code>. Now when <code>a[1]</code> is accessed, it means get whatever is there at offset <code>1</code> from the address of <code>a[0]</code>, the address (and the next block) to which the symbol <code>a</code> is bound. That's one assembly instruction. It's as if some information is embedded into the array symbol so that the CPU can fetch an element at an offset from the base address of the array in one instruction. When you access <code>*b</code> or <code>b[0]</code> or <code>b[1]</code>, you first get the content of <code>b</code> which is an address, then do the pointer arithmetic to get a new address and then get whatever is there at that address. So the CPU has to first load the content of <code>b</code>, evaluate <code>b+1</code> (for <code>b[1]</code>) and then load the content at address <code>b+1</code>. That's two assembly instructions. For an extern array, you don't need to specify its size.The only requirement is that it must match with its external definition. Therefore both the following statements are equivalent: <pre class="prettyprint"><code>extern int a[2]; // equivalent to the below statement extern int a[]; </code></pre> You must match the type of the variable in its declaration with its external definition. The linker doesn't check for types of variables when resolving references of symbols. Only functions have the types of the function encoded into the function name. Therefore you won't get any warning or error and it would compile just fine. Technically, the linker or some compiler component could track what type the symbol represents, and then give an error or warning. But there is no requirement from the standard to do so. You are required to do the right thing.

Difference between dereferencing pointer and accessing array elements

Tags:

I remember an example where the difference between pointers and arrays was demonstrated.

An array decays to a pointer to the first element in an array when passed as a function parameter, but they are not equivalent, as demonstrated next:

//file file1.c  int a[2] = {800, 801}; int b[2] = {100, 101};

//file file2.c  extern int a[2];  // here b is declared as pointer, // although the external unit defines it as an array extern int *b;   int main() {    int x1, x2;    x1 = a[1]; // ok   x2 = b[1]; // crash at runtime    return 0; }

The linker does not type-check for external variables so no errors are generated at compile time. The problem is that b is in fact an array, but the compilation unit file2 is unaware of that and treats b as a pointer, resulting in a crash when trying to dereference it.

I remember when this was explained it made perfect sense, but now I can't remember the explanation nor I can come up to it on my own.

So I guess the question is how is an array treated differently than a pointer when accessing elements? (because I thought that p[1] is converted to (the assembly equivalent of) *(p + 1) regardless if p is an array or a pointer — I am obviously wrong).

The assembly generated by the two dereferences (VS 2013):
note: 1158000h and 1158008h are the memory addresses of a and b respectively

    12:   x1 = a[1]; 0115139E  mov         eax,4   011513A3  shl         eax,0   011513A6  mov         ecx,dword ptr [eax+1158000h]   011513AC  mov         dword ptr [x1],ecx       13:   x2 = b[1]; 011513AF  mov         eax,4   011513B4  shl         eax,0   011513B7  mov         ecx,dword ptr ds:[1158008h]   011513BD  mov         edx,dword ptr [ecx+eax]   011513C0  mov         dword ptr [x2],edx

962

asked Feb 23 '14 18:02

bolov

2 Answers

Thanks to the link provided by @tesseract in the comments: Expert C Programming: Deep C Secrets (page 96), I came up with a simple answer (a simple dumb down version of the explanation in the book; for a full academic answer read the book):

when declared int a[2]:
- the compiler has for a an address where this variable is stored. This address is also the address of the array since the type of the variable is array.
- Accessing a[1] means:
  - retrieving that address,
  - adding the offset and
  - accessing the memory at this computed new address.
when declared int *b:
- the compiler also has an address for b but this is the address of the pointer variable, not the array.
- So accessing b[1] means:
  - retrieving that address,
  - accessing that location to get the value of b, i.e. the address of the array
  - adding an offset to this address and then
  - accessing the final memory location.

answered Oct 30 '22 08:10

3 revs

// in file2.c  extern int *b; // b is declared as a pointer to an integer  // in file1.c  int b[2] = {100, 101}; // b is defined and initialized as an array of 2 ints

The linker links them both to same memory address, however since the symbol b has different types in file1.c and file2.c, the same memory location is interpreted differently.

// in file2.c  int x2;  // assuming sizeof(int) == 4 x2 = b[1]; // b[1] == *(b+1) == *(100 + 1) == *(104) --> segfault

b[1] is evaluated first as *(b+1). This means get the value at the memory location b is bound to, add 1 to it (pointer arithmetic) to get a new address, load that value into the CPU register, store that value at the location x2 is bound to. So, the value at the location b is bound to is 100, add 1 to it to get 104 (pointer arithmetic; sizeof *b is 4) and get the value at the address 104! This is wrong and undefined behaviour and most likely will cause program crash.

There is a difference in how the elements of an array are accessed and how the values pointed to by a pointer are accessed. Let's take an example.

int a[] = {100, 800}; int *b = a;

a is an array of 2 integers and b is a pointer to an integer initialized to the address of the first element of a. Now when a[1] is accessed, it means get whatever is there at offset 1 from the address of a[0], the address (and the next block) to which the symbol a is bound. That's one assembly instruction. It's as if some information is embedded into the array symbol so that the CPU can fetch an element at an offset from the base address of the array in one instruction. When you access *b or b[0] or b[1], you first get the content of b which is an address, then do the pointer arithmetic to get a new address and then get whatever is there at that address. So the CPU has to first load the content of b, evaluate b+1 (for b[1]) and then load the content at address b+1. That's two assembly instructions.

For an extern array, you don't need to specify its size.The only requirement is that it must match with its external definition. Therefore both the following statements are equivalent:

extern int a[2];  // equivalent to the below statement extern int a[];

You must match the type of the variable in its declaration with its external definition. The linker doesn't check for types of variables when resolving references of symbols. Only functions have the types of the function encoded into the function name. Therefore you won't get any warning or error and it would compile just fine.

Technically, the linker or some compiler component could track what type the symbol represents, and then give an error or warning. But there is no requirement from the standard to do so. You are required to do the right thing.

answered Oct 30 '22 07:10

ajay

Related questions
                            
                                Is it possible to make SQLCMD Mode 'Sticky' in Database Projects
                            
                                How to use CORS to access an iframe
                            
                                Linked Files and Folder in IntelliJ
                            
                                $or statement in $elemMatch
                            
                                Django Get absolute url for static files
                            
                                How do I make Rails use SSL to connect to PostgreSQL?
                            
                                Communicate with another app using XPC
                            
                                Can't use data breakpoint C++, Visual Studio 2013
                            
                                Amend the second to last commit
                            
                                JdbcTemplate queryForList return value in case of no results
                            
                                JavaFX: what is the difference between EventHandler and EventFilter?
                            
                                iOS 8 NSInternalInconsistencyException

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With