Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Base Pointer and Stack Pointers: In Context with gcc Output

I have the following C program:

int main()
{
    int c[10] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 2};
    return c[0];
}

and when compiled using the -S directive with gcc I get the following assembly:

    .file   "array.c"
    .text
.globl main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $0, -48(%rbp)
    movl    $0, -44(%rbp)
    movl    $0, -40(%rbp)
    movl    $0, -36(%rbp)
    movl    $0, -32(%rbp)
    movl    $0, -28(%rbp)
    movl    $0, -24(%rbp)
    movl    $0, -20(%rbp)
    movl    $1, -16(%rbp)
    movl    $2, -12(%rbp)
    movl    -48(%rbp), %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (GNU) 4.4.5 20110214 (Red Hat 4.4.5-6)"
    .section        .note.GNU-stack,"",@progbits

What I do not understand is why are the earlier array elements further from the bp? It almost seems like the elements on the array are being placed in opposite order.

Also why does gcc not use push instead of movl, to push the array elements onto the stack?


DIFFERENT VIEW

Moving the array to global namespace as a static variable to the module I get:

    .file   "array.c"
    .data
    .align 32
    .type   c, @object
    .size   c, 40
c:
    .long   0
    .long   0
    .long   0
    .long   0
    .long   0
    .long   0
    .long   0
    .long   0
    .long   1
    .long   2
    .text
.globl main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    c(%rip), %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (GNU) 4.4.5 20110214 (Red Hat 4.4.5-6)"
    .section    .note.GNU-stack,"",@progbits

Using the following C program:

static int c[10] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 2};

int main() 
{
    return c[0];
}

This doesn't give more insight to the stack. But it is intersting to see the differement output of assembly using slightly different semantics.

like image 678
Matthew Hoggan Avatar asked Dec 12 '11 17:12

Matthew Hoggan


2 Answers

First of all, the x86 stack grows downwards. By convention, rbp stores the original value of rsp. Therefore, the function's arguments reside at positive offsets relative to rbp, and its automatic variables reside at negative offsets. The first element of an automatic array has a lower address than all other elements, and thus is the furthest away from rbp.

Here is a handy diagram that appears on this page:

stack layout

I see no reason why the compiler couldn't use a series of push instructions to initialize your array. Whether this would be a good idea, I am not sure.

like image 194
NPE Avatar answered Nov 15 '22 16:11

NPE


Also why does gcc not use push instead of movl, to push the array elements onto the stack?

It is quite rare to have a large initialized array in exactly the right place in the stack frame that you could use a sequence of pushes, so gcc has not been taught to do that. (In more detail: array initialization is handled as a block memory copy, which is emitted as either a sequence of move instructions or a call to memcpy, depending on how big it would be. The code that decides what to emit doesn't know where in memory the block is going, so it doesn't know if it could use push instead.)

Also, movl is faster. Specifically, push does an implicit read-modify-write of %esp, and therefore a sequence of pushes must execute in order. movl to independent addresses, by contrast, can execute in parallel. So by using a sequence of movls rather than pushes, gcc offers the CPU more instruction-level parallelism to take advantage of.

Note that if I compile your code with any level of optimization activated, the array vanishes altogether! Here's -O1 (this is the result of running objdump -dr on an object file, rather than -S output, so you can see the actual machine code)

0000000000000000 <main>:
   0:   b8 00 00 00 00          mov    $0x0,%eax
   5:   c3                      retq   

and -Os:

0000000000000000 <main>:
   0:   31 c0                   xor    %eax,%eax
   2:   c3                      retq   

Doing nothing is always faster than doing something. Clearing a register with xor is two bytes instead of five, but has a formal data dependence on the old contents of the register and modifies the condition codes, so might be slower and is thus only chosen when optimizing for size.

like image 28
zwol Avatar answered Nov 15 '22 16:11

zwol