I am a student and just started studying assembly language. To understand it better I just wrote a short in C and converted it to assembly language. Surprisingly I didn't understand a bit.
The code is:
#include<stdio.h>
int main()
{
int n;
n=4;
printf("%d",n);
return 0;
}
And the corresponding assembly language is:
.file "delta.c"
.section .rodata
.LC0:
.string "%d"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
movl $4, 28(%esp)
movl $.LC0, %eax
movl 28(%esp), %edx
movl %edx, 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
What do these mean?
An assembly language is a type of low-level programming language that is intended to communicate directly with a computer's hardware. Unlike machine language, which consists of binary and hexadecimal characters, assembly languages are designed to be readable by humans.
Assembly language helps programmers to write human-readable code that is almost similar to machine language. Machine language is difficult to understand and read as it is just a series of numbers. Assembly language helps in providing full control of what tasks a computer is performing.
Consider a very simple instruction mov AL, 00H, it is to move a value 00 (HEX) to the AL register of 8086. When the program is being executed, then the value B400 is read from memory, decoded and carried out the task. Usually, the term statement is used to describe a line in an assembly language program.
An assembly language is a low-level programming language designed for a specific type of processor. It may be produced by compiling source code from a high-level programming language (such as C/C++) but can also be written from scratch. Assembly code can be converted to machine code using an assembler.
Let's break it down:
.file "delta.c"
The compiler is using this to tell you the source file that the assembly came from. It doesn't mean much to the assembler.
.section .rodata
This starts a new section. "rodata" is the name for the "read-only data" section. This section ends up writing data to the executable that gets memory mapped in as read-only data. All the ".rodata" pages of an executable image end up being shared by all the processes that load the image.
Generally any "compile-time-constants" in your source code that can't be optimized away into assembly intrinsics will end up being stored in the "read only data section".
.LC0:
.string "%d"
The .LC0"
part is a label. It provdes a symbolic name that references the byes that occur after it in the file. In this case "LC0" represents the string "%d". The GNU assembler uses the convention that labels that start with an "L" are considered "local labels". This has a technical meaning that is mostly interesting to people who write compilers and linkers. In this case it's used by the compiler to refer to a symbol that is private to a particular object file. In this case it represents a string constant.
.text
This starts a new section. The "text" section is the section in object files that stores executable code.
.globl main
The ".global" directive tells the assembler to add the label that follows it to the list of labels "exported" by the generated object file. This basically means "this is a symbol that should be visible to the linker". For example a "non static" function in "C" can be called by any c file that declares (or includes) a compatible function prototype. This is why you can #include stdio.h
and then call printf
. When any non-static C-function is compiled, the compiler generates assembly that declares a global label that points at the beginning of the function. Contrast this with things that shouldn't be linked, such as string literals. The assembly code in the object file still needs a label to refer to the literal data. Those are "local" symbols.
.type main, @function
I don't know for sure how GAS (the gnu assembler) processes ".type" directives. However, this instructs the assembler that the label "main" refers to executable code, as opposed to data.
main:
This defines the entry point for your "main" function.
.LFB0:
This is a "local label" that refers to the start of the function.
.cfi_startproc
This is a "call frame information" directive. It instructs the assembler to emit dwarf format debugging information.
pushl %ebp
This is a standard part of a function "prologue" in assembly code. It's saving the current value of the "ebp" register. The "ebp" or "base" register is used to store the "base" of the stack frame within a function. Whereas the "esp" ("stack pointer") register can change as functions are called within a function, the "ebp" remains fixed. Any arguments to the function can always be accessed relative to "ebp". By ABI calling conventions, before a functon can modify the EBP register it must save it, so that the original value can be restored before the function returns.
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
I haven't investigated these in detail, but I believe they are related to DWARF debugging information.
movl %esp, %ebp
GAS uses AT&T syntax, which is backwards from what the Intel manual uses. This means "set ebp equal to esp". This basically establishes the "base pointer" for the rest of the function.
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
This is also part of the epilouge for the function. This aligns the stack pointer, and then subtracts enough room from it to hold all the locals for the function.
movl $4, 28(%esp)
This loads the 32 bit integer constant 4 into a slot in the stack frame.
movl $.LC0, %eax
This loads the "%d" string constant defined above into eax.
movl 28(%esp), %edx
This loads the value "4" stored in offset 28 in the stack to edx. Chances are your code was compiled with optimizations turned off.
movl %edx, 4(%esp)
This then moves the value 4 onto the stack, in the place it needs to be when calling printf.
movl %eax, (%esp)
This loads the string "%d" into the place on the stack it needs to be when calling printf.
call printf
This calls printf.
movl $0, %eax
This sets eax to 0. Given that the next instructions are "leave" and "ret", this is equavlent to "return 0" in C code. The EAX register is used to hold your function's return value.
leave
This instruction cleans up the call frame. It sets ESP back to EBP, then pops EBP out of the modified stack pointer. Like the next instruction this is part of the function's epilogue.
.cfi_restore 5
.cfi_def_cfa 4, 4
This is more DWARF stuff
ret
This is the actual return instruction. It returns from the functon
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",@progbits
For me, intels syntax is easier to read, learning how to generate intels syntax is handy for understanding C programs better;
gcc -S -masm=intel file.c
In windows your C program becomes;
.file "file.c"
.intel_syntax noprefix
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "%d\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB13:
.cfi_startproc
push ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
mov ebp, esp
.cfi_def_cfa_register 5
and esp, -16
sub esp, 32
call ___main
mov DWORD PTR [esp+28], 4
mov eax, DWORD PTR [esp+28]
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:LC0
call _printf
mov eax, 0
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE13:
.ident "GCC: (rev2, Built by MinGW-builds project) 4.8.1"
.def _printf; .scl 2; .type 32; .endef
(the compiler options should be the same on ubuntu as in windows)
Apart from the psychotic labels, this is more like the assembly i read about in text books..
Here is a way of looking at it;
call ___main
mov DWORD PTR [esp+28], 4
mov eax, DWORD PTR [esp+28] ; int n = 4;
mov DWORD PTR [esp+4], eax
mov DWORD PTR [esp], OFFSET FLAT:LC0
call _printf ; printf("%d",n);
mov eax, 0
leave ; return 0;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With