I'm trying to learn a bit about assembly. I decided to start by looking at the generated assembly files from simple source code. Of course, I get bombarded by instructions that I have no idea what they mean, and I start to search for their meaning on the internet. While searching, I realized that I have no idea what assembly language I'm looking for..
Is there a way to know which assembly language gcc generates? Does this question even make sense? I am mainly interested in the assembly that my system accepts (or however I should phrase that..). See below for the generated code using gcc.
If you realize which knowledge gaps I have, please link the relevant documents to read/study.
System:
OS: Windows 10 Pro
Processor: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz 2.20 GHz
Type: 64-bit operating system, x64-based processor
//test.c
int main(){
int x = 2;
return 0;
}
//test.s
.file "test.c"
.text
.def __main; .scl 2; .type 32; .endef
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $48, %rsp
.seh_stackalloc 48
.seh_endprologue
call __main
movl $2, -4(%rbp)
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (Rev10, Built by MSYS2 project) 10.2.0"
GCC always produces asm output that the GNU assembler can assemble, on any platform. (GAS / GNU as
is part of GNU Binutils, along with tools like ld
, a linker.)
In your case, the target is x86-64 Windows (prob. from x86_64-w64-mingw32-gcc),
and the instruction syntax is AT&T syntax (GCC and GAS default for x86 including x86-64).
The comment character is #
in GAS for x86 (including x86-64).
Anything starting with a .
is a directive; some, like .globl main
to export the symbol main
as visible in the .o
for linking, are universal to GAS in general; check the GAS manual.
SEH directives like .seh_setframe %rbp, 0
are Windows-specific stack-unwind metadata for Structured Exception Handling, specific to Windows object-file formats. (Which you can 100% ignore, until/unless you want to learn how backtraces and exception handling work under the hood, without relying on a chain of legacy frame pointers. AFAIK, it's basically equivalent to ELF/Linux .eh_frame
metadata from .cfi
directives.)
In fact you can ignore almost all the directives, with the only really important ones being sections like .text
vs. .data
, and somewhat important to make linking work being .globl
. That's why https://godbolt.org/ filters directives by default.
You can use gcc -masm=intel
if you want Intel syntax / mnemonics which you can look up in Intel's manuals. (https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html / https://www.felixcloutier.com/x86/). See also How to remove "noise" from GCC/clang assembly output?. (gcc -O1 -fverbose-asm
might be interesting.)
If you want to learn AT&T syntax, see https://stackoverflow.com/tags/att/info. The GAS manual also has a page about AT&T vs. Intel syntax, but it's not written as a tutorial, i.e. it assumes you know how x86 instructions work, and are looking for details on the syntax GAS uses to describe them: https://sourceware.org/binutils/docs/as/i386_002dVariations.html
(Keep in mind that the CPU actually runs machine code, and it doesn't matter how the bytes get into memory, just that they do. So different assemblers (like NASM vs. GAS) and different syntaxes (like .intel_syntax noprefix
) ultimately have the same limitations on what the machine can do or not in one instruction. All mainstream assemblers can let you express pretty much everything every instruction can do, it's just a matter of knowing the syntax for immediates, addressing modes, and so on. Intel and AMD's manuals document exactly what the CPU can do, using Intel syntax but not nailing down the details of syntax or directives.)
Resources (including some linked above):
Is there a way to know which assembly language gcc generates?
Yeah the one for your target port. Which appears to be x86. This assembler language in turn comes in various flavours and dialects, with tons of history: https://en.wikipedia.org/wiki/X86_assembly_language
Of course, I get bombarded by instructions that I have no idea what they mean
Reading C compiler-generated assembler is much harder than reading hand coded assembler. I'd recommend to start with some assembler tutorials with code examples written by humans instead.
x86 is also perhaps the hardest one of them all because of all the flavours, and because of the complexity of the core. It's generally recommended to learn some simple assembler first to get the hang of it.
8 bit microcontrollers is a good place to start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With