Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compilers: Understanding assembly code generated from small programs

I'm self-studying how compilers works. I'm learning by reading the disassembly of GCC generated code from small 64-bit Linux programs.

I wrote this C program:

#include <stdio.h>

int main()
{
    for(int i=0;i<10;i++){
        int k=0;
    }
}

After using objdump I get:

00000000004004d6 <main>:
  4004d6:       55                      push   rbp
  4004d7:       48 89 e5                mov    rbp,rsp
  4004da:       c7 45 f8 00 00 00 00    mov    DWORD PTR [rbp-0x8],0x0
  4004e1:       eb 0b                   jmp    4004ee <main+0x18>
  4004e3:       c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
  4004ea:       83 45 f8 01             add    DWORD PTR [rbp-0x8],0x1
  4004ee:       83 7d f8 09             cmp    DWORD PTR [rbp-0x8],0x9
  4004f2:       7e ef                   jle    4004e3 <main+0xd>
  4004f4:       b8 00 00 00 00          mov    eax,0x0
  4004f9:       5d                      pop    rbp
  4004fa:       c3                      ret    
  4004fb:       0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

Now I have some doubts.

  1. What is that NOP at the end for, and why is it there? (alignment?)

  2. I'm compiling with gcc -Wall <program.c>. Why am I not getting the warning control reaches end of non-void function?

  3. Why doesn't the compiler allocate space on the stack with sub rsp,0x10? Why doesn't it use the rbp register for referencing local stack data?

    PS: If I call a function (like printf) in the for loop, why does the compiler suddenly generate sub rsp,0x10? Why does it still references local data with the rsp register. I expect the generated code to reference local stack data with rbp!

like image 879
Ofey Avatar asked Mar 24 '17 07:03

Ofey


People also ask

How do compilers generate assembly?

In most multi-pass compilers assembly language is generated during the code generation steps. This allows you to write the lexer, syntax and semantic phases once and then generate executable code using a single assembler back end.

What do compilers produce?

Compilers generate object (executable) files from source code files.

Which software is used for assembly language programming?

These include MASM (Macro Assembler from Microsoft), TASM (Turbo Assembler from Borland), NASM (Netwide Assembler for both Windows and Linux), and GNU assembler distributed by the free software foundation.

What are compilers and assemblers?

The difference between compiler and assembler is that a compiler is used to convert high-level programming language code into machine language code. On the other hand, an assembler converts assembly level language code into machine language code. Both these terms are relevant in context to program execution.


2 Answers

Regarding the second question, since the C99 standard it's allowed to not have an explicit return 0 in the main function, the compiler will add it implicitly. Note that this is only for the main function, no other function.

As for the third question, the rbp register acts as the frame pointer.

Lastly the PS. It's likely that the called function is using 16 bytes (0x10) for the arguments passed to the function. The subtraction is what "removes" those variables from the stack. Could it possibly be two pointers you pass as arguments?

If you're serious learning how compilers in general works, and possibly want to create your own (it's fun! :)), then I suggest you invest in some books about the theory and practice of it. The dragon book is an excellent addition to any programmers bookshelf.

like image 86
Some programmer dude Avatar answered Oct 29 '22 22:10

Some programmer dude


  1. Yes, the nop is for alignment. Compilers use different instructions for different lengths of padding needed, knowing that modern CPU will be pre-fetching and decoding several instructions ahead.

  2. As others have said, the C99 standard returns 0 from main() by default if there's no explicit return statement (see 5.1.2.2.3 in C99 TC3), so no warning is raised.

  3. The 64-bit System V Linux ABI reserves a 128-byte "red zone" below the current stack pointer that leaf functions (functions that do not call any other functions - and your main() is one such) can use for local variables and other scratch values without having to sub rsp / add rsp. And so rbp == rsp.

And for the PS: when you call a function in the for() loop (or anywhere in your main()), main() is no longer a leaf function, so the compiler can no longer use the red zone. That's why the it allocates space on the stack with sub rsp, 0x10. However, it knows the relationship between rsp and rbp, so it can use either when accessing data.

like image 30
user7761803 Avatar answered Oct 29 '22 21:10

user7761803