To generate assembly code we essentially request GCC to stop before the assembly stage of compilation and dump what it has generated from the compiler backend. This writes the assembly code to a foobar. s file. For x86 and x64 assembly code, the AT&T syntax is used by default.
GCC compiles to assembler. Some other compilers don't. For example, LLVM-GCC compiles to LLVM-assembly or LLVM-bytecode, which is then compiled to machine code.
The gcc provides a great feature to get all intermediate outputs from a source code while executing. To get the assembler output we can use the option '-S' for the gcc. This option shows the output after compiling, but before sending to the assembler.
GCC in capitals is the abbreviation for the GNU Compiler Collection and it supports languages like C and C++. Now in all lower case, it is the GNU C Compiler. It will compile your C code making it object code, also called machine code.
If you compile with debug symbols (add -g
to your GCC command line, even if you're also using -O3
1),
you can use objdump -S
to produce a more readable disassembly interleaved with C source.
>objdump --help
[...]
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output
objdump -drwC -Mintel
is nice:
-r
shows symbol names on relocations (so you'd see puts
in the call
instruction below)-R
shows dynamic-linking relocations / symbol names (useful on shared libraries)-C
demangles C++ symbol names-w
is "wide" mode: it doesn't line-wrap the machine-code bytes-Mintel
: use GAS/binutils MASM-like .intel_syntax noprefix
syntax instead of AT&T-S
: interleave source lines with disassembly.You could put something like alias disas="objdump -drwCS -Mintel"
in your ~/.bashrc
. If not on x86, or if you like AT&T syntax, omit -Mintel
.
Example:
> gcc -g -c test.c
> objdump -d -M intel -S test.o
test.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
#include <stdio.h>
int main(void)
{
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: 83 ec 10 sub esp,0x10
puts("test");
9: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
10: e8 fc ff ff ff call 11 <main+0x11>
return 0;
15: b8 00 00 00 00 mov eax,0x0
}
1a: c9 leave
1b: c3 ret
Note that this isn't using -r
so the call rel32=-4
isn't annotated with the puts
symbol name. And looks like a broken call
that jumps into the middle of the call instruction in main. Remember that the rel32
displacement in the call encoding is just a placeholder until the linker fills in a real offset (to a PLT stub in this case, unless you statically link libc).
Footnote 1: Interleaving source can be messy and not very helpful in optimized builds; for that, consider https://godbolt.org/ or other ways of visualizing which instructions go with which source lines. In optimized code there's not always a single source line that accounts for an instruction but the debug info will pick one source line for each asm instruction.
If you give GCC the flag -fverbose-asm
, it will
Put extra commentary information in the generated assembly code to make it more readable.
[...] The added comments include:
- information on the compiler version and command-line options,
- the source code lines associated with the assembly instructions, in the form FILENAME:LINENUMBER:CONTENT OF LINE,
- hints on which high-level expressions correspond to the various assembly instruction operands.
Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S foo.c
will leave the generated assembly code on the file foo.s.
Ripped straight from http://www.delorie.com/djgpp/v2faq/faq8_20.html (but removing erroneous -c
)
Using the -S
switch to GCC on x86 based systems produces a dump of AT&T syntax, by default, which can be specified with the -masm=att
switch, like so:
gcc -S -masm=att code.c
Whereas if you'd like to produce a dump in Intel syntax, you could use the -masm=intel
switch, like so:
gcc -S -masm=intel code.c
(Both produce dumps of code.c
into their various syntax, into the file code.s
respectively)
In order to produce similar effects with objdump, you'd want to use the --disassembler-options=
intel
/att
switch, an example (with code dumps to illustrate the differences in syntax):
$ objdump -d --disassembler-options=att code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483c8: 83 e4 f0 and $0xfffffff0,%esp
80483cb: ff 71 fc pushl -0x4(%ecx)
80483ce: 55 push %ebp
80483cf: 89 e5 mov %esp,%ebp
80483d1: 51 push %ecx
80483d2: 83 ec 04 sub $0x4,%esp
80483d5: c7 04 24 b0 84 04 08 movl $0x80484b0,(%esp)
80483dc: e8 13 ff ff ff call 80482f4 <puts@plt>
80483e1: b8 00 00 00 00 mov $0x0,%eax
80483e6: 83 c4 04 add $0x4,%esp
80483e9: 59 pop %ecx
80483ea: 5d pop %ebp
80483eb: 8d 61 fc lea -0x4(%ecx),%esp
80483ee: c3 ret
80483ef: 90 nop
and
$ objdump -d --disassembler-options=intel code.c
080483c4 <main>:
80483c4: 8d 4c 24 04 lea ecx,[esp+0x4]
80483c8: 83 e4 f0 and esp,0xfffffff0
80483cb: ff 71 fc push DWORD PTR [ecx-0x4]
80483ce: 55 push ebp
80483cf: 89 e5 mov ebp,esp
80483d1: 51 push ecx
80483d2: 83 ec 04 sub esp,0x4
80483d5: c7 04 24 b0 84 04 08 mov DWORD PTR [esp],0x80484b0
80483dc: e8 13 ff ff ff call 80482f4 <puts@plt>
80483e1: b8 00 00 00 00 mov eax,0x0
80483e6: 83 c4 04 add esp,0x4
80483e9: 59 pop ecx
80483ea: 5d pop ebp
80483eb: 8d 61 fc lea esp,[ecx-0x4]
80483ee: c3 ret
80483ef: 90 nop
godbolt is a very useful tool, they list only has C++ compilers but you can use -x c
flag in order to get it treat the code as C. It will then generate an assembly listing for your code side by side and you can use the Colourise
option to generate colored bars to visually indicate which source code maps to the generated assembly. For example the following code:
#include <stdio.h>
void func()
{
printf( "hello world\n" ) ;
}
using the following command line:
-x c -std=c99 -O3
and Colourise
would generate the following:
Did you try gcc -S -fverbose-asm -O source.c
then look into the generated source.s
assembler file ?
The generated assembler code goes into source.s
(you could override that with -o
assembler-filename ); the -fverbose-asm
option asks the compiler to emit some assembler comments "explaining" the generated assembler code. The -O
option asks the compiler to optimize a bit (it could optimize more with -O2
or -O3
).
If you want to understand what gcc
is doing try passing -fdump-tree-all
but be cautious: you'll get hundreds of dump files.
BTW, GCC is extensible thru plugins or with MELT (a high level domain specific language to extend GCC; which I abandoned in 2017)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With