Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding gcc -S output

I did gcc -S on the very complex program below on x86_64:

int main() {
    int x = 3;
    x = 5;
    return 0;
}

And what I got was:

       .file   "main.c"
        .text
.globl main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $3, -4(%rbp)
        movl    $5, -4(%rbp)
        movl    $0, %eax
        leave
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-3)"
        .section        .note.GNU-stack,"",@progbits

I was wondering if someone could help me understand the output or refer me to some link explaining. Specifically, What does cfi ,LFB0,LFE0 , leave mean? All I could find regarding these is this post but couldn't fully understand what it was for. Also, what does ret do in this case? I'm guessing it's returning to __libc_start_main() which in turn would call do_exit() , is that correct?

like image 244
Shmoopy Avatar asked Mar 08 '13 01:03

Shmoopy


People also ask

What is GCC and how it works?

GCC is an integrated collection of compilers for several major programming languages, which as of this writing are C, C++, Objective-C, Java, FORTRAN, and Ada. The GNU compilers all generate machine code, not higher-level language code which is then translated via another compiler.

Does GCC output assembly?

Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.

What is a out in GCC?

a. out is the default executable name generated by the gcc . Once you invoke gcc a. out (which you really shouldn't - as it is passing a. out as an input to gcc ), it is trying to create a new a.


2 Answers

Those .cfisomething directives result in generation of additional data by the compiler. This data helps traverse the call stack when an instruction causes an exception, so the exception handler (if any) can be found and correctly executed. The call stack information is useful for debugging. This data most probably goes into a separate section of the executable. It's not inserted between the instructions of your code.

.LFsomething: are just regular labels that are probably referenced by that extra exception-related data.

leave and ret are CPU instructions.

leave is equivalent to:

movq    %rbp, %rsp
popq    %rbp

and it undoes the effect of these two instructions

pushq   %rbp
movq    %rsp, %rbp

and instructions that allocate space on the stack by subtracting something from rsp.

ret returns from the function. It pops the return address from the stack and jumps to that address. If it was __libc_start_main() that called main(), then it returns there.

like image 69
Alexey Frunze Avatar answered Sep 21 '22 23:09

Alexey Frunze


  1. .LFB0, .LFE0 are nothing but local labels.

  2. .cfi_startproc is used at the beginning of each function and end of the function happens by .cfi_endproc.

    • These assembler directives help the assembler to put debugging and stack unwinding information into the executable.
  3. the leave instruction is an x86 assembler instruction which does the work of restoring the calling function's stack frame.

And lastly after the ret instruction, the following things happen:

  • %rip contains return address
  • %rsp points at arguments pushed by caller that didn't fit in the six registers used to pass arguments on amd64 (%rdi, %rsi, %rdx, %rcx, %r8, %r9)
  • called function may have trashed arguments
  • %rax contains return value (or trash if function is void) (or %rax and %rdx contain the return value if its size is >8 bytes but <=16 bytes1)
  • %r10, %r11 may be trashed
  • %rbp, %rbx, %r12, %r13, %r14, %r15 must contain contents from time of call

Additional information can be found here (SO question) and here (standards PDFs).

Or, on 32-bit:

  • %eip contains return address
  • %esp points at arguments pushed by caller
  • called function may have trashed arguments
  • %eax contains return value (or trash if function is void)
  • %ecx, %edx may be trashed
  • %ebp, %ebx, %esi, %edi must contain contents from time of call
like image 44
Travis G Avatar answered Sep 22 '22 23:09

Travis G