Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What assembly language does gcc produce on my system?

I'm trying to learn a bit about assembly. I decided to start by looking at the generated assembly files from simple source code. Of course, I get bombarded by instructions that I have no idea what they mean, and I start to search for their meaning on the internet. While searching, I realized that I have no idea what assembly language I'm looking for..

Is there a way to know which assembly language gcc generates? Does this question even make sense? I am mainly interested in the assembly that my system accepts (or however I should phrase that..). See below for the generated code using gcc.

If you realize which knowledge gaps I have, please link the relevant documents to read/study.

System:

OS: Windows 10 Pro

Processor: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz 2.20 GHz

Type: 64-bit operating system, x64-based processor

//test.c

int main(){

    int x = 2;

    return 0;
}

 //test.s
.file   "test.c"
    .text
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
   .seh_stackalloc  48
   .seh_endprologue
    call    __main
    movl    $2, -4(%rbp)
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret
   .seh_endproc
   .ident   "GCC: (Rev10, Built by MSYS2 project) 10.2.0"
like image 386
JustANoob Avatar asked Apr 27 '21 09:04

JustANoob


2 Answers

GCC always produces asm output that the GNU assembler can assemble, on any platform. (GAS / GNU as is part of GNU Binutils, along with tools like ld, a linker.)

In your case, the target is x86-64 Windows (prob. from x86_64-w64-mingw32-gcc),
and the instruction syntax is AT&T syntax (GCC and GAS default for x86 including x86-64).

The comment character is # in GAS for x86 (including x86-64).
Anything starting with a . is a directive; some, like .globl main to export the symbol main as visible in the .o for linking, are universal to GAS in general; check the GAS manual.

SEH directives like .seh_setframe %rbp, 0 are Windows-specific stack-unwind metadata for Structured Exception Handling, specific to Windows object-file formats. (Which you can 100% ignore, until/unless you want to learn how backtraces and exception handling work under the hood, without relying on a chain of legacy frame pointers. AFAIK, it's basically equivalent to ELF/Linux .eh_frame metadata from .cfi directives.)

In fact you can ignore almost all the directives, with the only really important ones being sections like .text vs. .data, and somewhat important to make linking work being .globl. That's why https://godbolt.org/ filters directives by default.


You can use gcc -masm=intel if you want Intel syntax / mnemonics which you can look up in Intel's manuals. (https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html / https://www.felixcloutier.com/x86/). See also How to remove "noise" from GCC/clang assembly output?. (gcc -O1 -fverbose-asm might be interesting.)

If you want to learn AT&T syntax, see https://stackoverflow.com/tags/att/info. The GAS manual also has a page about AT&T vs. Intel syntax, but it's not written as a tutorial, i.e. it assumes you know how x86 instructions work, and are looking for details on the syntax GAS uses to describe them: https://sourceware.org/binutils/docs/as/i386_002dVariations.html

(Keep in mind that the CPU actually runs machine code, and it doesn't matter how the bytes get into memory, just that they do. So different assemblers (like NASM vs. GAS) and different syntaxes (like .intel_syntax noprefix) ultimately have the same limitations on what the machine can do or not in one instruction. All mainstream assemblers can let you express pretty much everything every instruction can do, it's just a matter of knowing the syntax for immediates, addressing modes, and so on. Intel and AMD's manuals document exactly what the CPU can do, using Intel syntax but not nailing down the details of syntax or directives.)


Resources (including some linked above):

  • Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” and How to remove "noise" from GCC/clang assembly output?
  • The x86 tag wiki
  • https://stackoverflow.com/tags/att/info
  • https://sourceware.org/binutils/docs/as/ GAS manual
  • https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html Intel manuals
  • https://support.amd.com/TechDocs/24594.pdf (AMD vol.3 manual: general purpose instructions)
like image 104
Peter Cordes Avatar answered Oct 20 '22 10:10

Peter Cordes


Is there a way to know which assembly language gcc generates?

Yeah the one for your target port. Which appears to be x86. This assembler language in turn comes in various flavours and dialects, with tons of history: https://en.wikipedia.org/wiki/X86_assembly_language

Of course, I get bombarded by instructions that I have no idea what they mean

Reading C compiler-generated assembler is much harder than reading hand coded assembler. I'd recommend to start with some assembler tutorials with code examples written by humans instead.

x86 is also perhaps the hardest one of them all because of all the flavours, and because of the complexity of the core. It's generally recommended to learn some simple assembler first to get the hang of it.

8 bit microcontrollers is a good place to start.

like image 25
Lundin Avatar answered Oct 20 '22 09:10

Lundin