I was told by c-faq that compiler do different things to deal with a[i] while a is an array or a pointer. Here's an example from c-faq:
char a[] = "hello"; char *p = "world";
Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location ``a'', move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location ``p'', fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to.
But I was told that when dealing with a[i], the compiler tends to convert a (which is an array) to a pointer-to-array. So I want to see assembly codes to find out which is right.
EDIT:
Here's the source of this statement. c-faq And note this sentence:
an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different, "
I'm pretty confused of this: since a has decayed to pointer, then why does he mean about "memory accesses will be different?"
Here's my code:
// array.cpp
#include <cstdio>
using namespace std;
int main()
{
char a[6] = "hello";
char *p = "world";
printf("%c\n", a[3]);
printf("%c\n", p[3]);
}
And here's part of the assembly code I got using g++ -S array.cpp
.file "array.cpp"
.section .rodata
.LC0:
.string "world"
.LC1:
.string "%c\n"
.text
.globl main
.type main, @function
main:
.LFB2:
leal 4(%esp), %ecx
.LCFI0:
andl $-16, %esp
pushl -4(%ecx)
.LCFI1:
pushl %ebp
.LCFI2:
movl %esp, %ebp
.LCFI3:
pushl %ecx
.LCFI4:
subl $36, %esp
.LCFI5:
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
movl $.LC0, -8(%ebp)
movzbl -11(%ebp), %eax
movsbl %al,%eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
movl -8(%ebp), %eax
addl $3, %eax
movzbl (%eax), %eax
movsbl %al,%eax
movl %eax, 4(%esp)
movl $.LC1, (%esp)
call printf
movl $0, %eax
addl $36, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
I can not figure out the mechanism of a[3] and p[3] from codes above. Such as:
And, note these 3 lines of codes:
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
movl $.LC0, -8(%ebp)
The last one use "movl" but why did not it overwrite the content of -10(%ebp)? (I know the anser now :), the address is incremental and "movl $.LC0 -8(%ebp) will only overwrite {-8, -7, -6, -5}(%ebp))
I'm sorry but I'm totally confused of the mechanism, as well as assembly code...
Thank you very much for your help.
a
is a pointer to an array of chars. p
is a pointer to a char which happens to, in this case, being pointed at a string-literal.
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
Initializes the local "hello" on the stack (that's why it is referenced through ebp
). Since there are more than 4bytes in "hello", it takes two instructions.
movzbl -11(%ebp), %eax
movsbl %al,%eax
References a[3]
: the two step process is because of a limitation in terms of access to the memory referenced though ebp
(my x86-fu is a bit rusty).
movl -8(%ebp), %eax
does indeed reference the p
pointer.
LC0
references a "relative memory" location: a fixed memory location will be allocated once the program is loaded in memory.
movsbl %al,%eax
means: "move single byte, lower" (give or take... I'd have to look it up... I am a bit rusty on this front). al
represent a byte from the register eax
.
Getting on the language side of this, since the assembler side has already been handled:
Note this sentence: " an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different, " I'm pretty confused of this: since a has decayed to pointer, then why does he mean about "memory accesses will be different?
This is because after decaying, access is equal for the (now a pointer value) and the pointer. But the difference is how that pointer value is got in the first place. Let's look at an example:
char c[1];
char cc;
char *pc = &cc;
Now, you have an array. This array does not take any storage other than one char! There is no pointer stored for it. And you have a pointer that points to a char. The pointer takes the size of one address, and you have one char that the pointer points to. Now let's look what happens for the array case to get the the pointer value:
c[0] = 'A';
// #1: equivalent: *(c + 0) = 'A';
// #2: => 'c' appears not in address-of or sizeof
// #3: => get address of "c": This is the pointer value P1
The pointer case is different:
pc[0] = 'A';
// #1: equivalent: *(pc + 0) = 'A';
// #2: => pointer value is stored in 'pc'
// #3: => thus: read address stored in 'pc': This is the pointer value P1
As you see, for the array case for getting the pointer value needed where we add the index value to (in this case a boring 0
), we don't need to read from memory, because the address of the array is already the pointer value needed. But for the pointer case, the pointer value we need is stored in the pointer: We need one read from memory to get that address.
After this, the path is equal for both:
// #4: add "0 * sizeof(char)" to P1. This is the address P2
// #5: store 'A' to address P2
Here is the assembler code generated for the array and the pointer case:
add $2, $0, 65 ; write 65 into r2
stb $2, $0, c ; store r2 into address of c
# pointer case follows
ldw $3, $0, pc ; load value stored in pc into r3
add $2, $0, 65 ; write 65 into r2
stb $2, $3, 0 ; store r2 into address loaded to r3
We can just store 65
(ASCII for 'A'
) at the address of c
(which will be known already at compile or link time when it is global). For the pointer case, we will first have to load the address stored by it into register 3
, and then write the 65
to that address.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With