Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding empty main()'s translation into assembly

Could somebody please explain what GCC is doing for this piece of code? What is it initializing? The original code is:

#include <stdio.h>
int main()
{

}

And it was translated to:

    .file   "test1.c"
    .def    ___main;    .scl    2;  .type   32; .endef
    .text
.globl _main
    .def    _main;  .scl    2;  .type   32; .endef
_main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $8, %esp
    andl    $-16, %esp
    movl    $0, %eax
    addl    $15, %eax
    addl    $15, %eax
    shrl    $4, %eax
    sall    $4, %eax
    movl    %eax, -4(%ebp)
    movl    -4(%ebp), %eax
    call    __alloca
    call    ___main
    leave
    ret

I would be grateful if a compiler/assembly guru got me started by explaining the stack, register and the section initializations. I cant make head or tail out of the code.

EDIT: I am using gcc 3.4.5. and the command line argument is gcc -S test1.c

Thank You, kunjaan.

like image 590
unj2 Avatar asked May 05 '09 03:05

unj2


People also ask

What does .fill do in assembly?

. FILL tells the assembler to set aside the next location in the program and initial- ize it with the value of the operand.

How do I convert ASM to C?

I can say with 99% guarantee, there is no ready converter for this assembly language, so you need to write one. You can simply implement it replacing ASM command with C function: movf BARGB2,w -> c_movf(BARGB2,w); subwf AARGB2,f -> c_subwf(AARGB2,f); This part is easy :) Then you need to implement each function.

What is the general format of an assembler instruction?

Each source statement may include up to four fields: a label, an operation (instruction mnemonic or assembler directive), an operand, and a comment. The following are examples of an assembly directive and a regular machine instruction.

What are assembly language instructions?

Assembly-language allows the designer to program in terms of the machine instructions that a specific processor can perform. Since binary machine-code instructions are difficult to understand directly, assembly-language programs are expressed in a symbolic notation.


1 Answers

I should preface all my comments by saying, I am still learning assembly.

I will ignore the section initialization. A explanation for the section initialization and basically everything else I cover can be found here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

The ebp register is the stack frame base pointer, hence the BP. It stores a pointer to the beginning of the current stack.

The esp register is the stack pointer. It holds the memory location of the top of the stack. Each time we push something on the stack esp is updated so that it always points to an address the top of the stack.

So ebp points to the base and esp points to the top. So the stack looks like:

esp -----> 000a3   fa
           000a4   21
           000a5   66
           000a6   23
ebp -----> 000a7   54

If you push e4 on the stack this is what happens:

esp -----> 000a2   e4
           000a3   fa
           000a4   21
           000a5   66
           000a6   23
ebp -----> 000a7   54

Notice that the stack grows towards lower addresses, this fact will be important below.

The first two steps are known as the procedure prolog or more commonly as the function prolog. They prepare the stack for use by local variables (See procedure prolog quote at the bottom).

In step 1 we save the pointer to the old stack frame on the stack by calling pushl %ebp. Since main is the first function called, I have no idea what the previous value of %ebp points too.

Step 2, We are entering a new stack frame because we are entering a new function (main). Therefore, we must set a new stack frame base pointer. We use the value in esp to be the beginning of our stack frame.

Step 3. Allocates 8 bytes of space on the stack. As we mentioned above, the stack grows toward lower addresses thus, subtracting by 8, moves the top of the stack by 8 bytes.

Step 4; Aligns the stack, I've found different opinions on this. I'm not really sure exactly what this is done. I suspect it is done to allow large instructions (SIMD) to be allocated on the stack,

http://gcc.gnu.org/ml/gcc/2008-01/msg00282.html

This code "and"s ESP with 0xFFFF0000, aligning the stack with the next lowest 16-byte boundary. An examination of Mingw's source code reveals that this may be for SIMD instructions appearing in the "_main" routine, which operate only on aligned addresses. Since our routine doesn't contain SIMD instructions, this line is unnecessary.

http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax

Steps 5 through 11 seem to have no purpose to me. I couldn't find any explanation on google. Could someone who really knows this stuff provide a deeper understanding. I've heard rumors that this stuff is used for C's exception handling.

Step 5, stores the return value of main 0, in eax.

Step 6 and 7 we add 15 in hex to eax for unknown reason. eax = 01111 + 01111 = 11110

Step 8 we shift the bits of eax 4 bits to the right. eax = 00001 because the last bits are shift off the end 00001 | 111.

Step 9 we shift the bits of eax 4 bits to the left, eax = 10000.

Steps 10 and 11 moves the value in the first 4 allocated bytes on the stack into eax and then moves it from eax back.

Steps 12 and 13 setup the c library.

We have reached the function epilogue. That is, the part of the function which returns the stack pointers, esp and ebp to the state they were in before this function was called.

Step 14, leave sets esp to the value of ebp, moving the top of stack to the address it was before main was called. Then it sets ebp to point to the address we saved on the top of the stack during step 1.

Leave can just be replaced with the following instructions:

mov  %ebp, %esp
pop  %ebp

Step 15, returns and exits the function.

1.    pushl       %ebp
2.    movl        %esp, %ebp
3.    subl        $8, %esp
4.    andl        $-16, %esp
5.    movl        $0, %eax
6.    addl        $15, %eax
7.    addl        $15, %eax
8.    shrl        $4, %eax
9.    sall        $4, %eax
10.   movl        %eax, -4(%ebp)
11.   movl        -4(%ebp), %eax
12.   call        __alloca
13.   call        ___main
14.   leave
15.   ret

Procedure Prolog:

The first thing a function has to do is called the procedure prolog. It first saves the current base pointer (ebp) with the instruction pushl %ebp (remember ebp is the register used for accessing function parameters and local variables). Now it copies the stack pointer (esp) to the base pointer (ebp) with the instruction movl %esp, %ebp. This allows you to access the function parameters as indexes from the base pointer. Local variables are always a subtraction from ebp, such as -4(%ebp) or (%ebp)-4 for the first local variable, the return value is always at 4(%ebp) or (%ebp)+4, each parameter or argument is at N*4+4(%ebp) such as 8(%ebp) for the first argument while the old ebp is at (%ebp).

http://www.milw0rm.com/papers/52

A really great stack overflow thread exists which answers much of this question. Why are there extra instructions in my gcc output?

A good reference on x86 machine code instructions can be found here: http://programminggroundup.blogspot.com/2007/01/appendix-b-common-x86-instructions.html

This a lecture which contains some of the ideas used below: http://csc.colstate.edu/bosworth/cpsc5155/Y2006_TheFall/MySlides/CPSC5155_L23.htm

Here is another take on answering your question: http://www.phiral.net/linuxasmone.htm

None of these sources explain everything.

like image 63
Ethan Heilman Avatar answered Nov 07 '22 04:11

Ethan Heilman