Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does GCC on x86-64 insert a NOP inside of a function?

Given the following C function:

void go(char *data) {
    char name[64];
    strcpy(name, data);
}

GCC 5 and 6 on x86-64 compile (plain gcc -c -g -o followed by objdump) this to:

0000000000000000 <go>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 50             sub    $0x50,%rsp
   8:   48 89 7d b8             mov    %rdi,-0x48(%rbp)
   c:   48 8b 55 b8             mov    -0x48(%rbp),%rdx
  10:   48 8d 45 c0             lea    -0x40(%rbp),%rax
  14:   48 89 d6                mov    %rdx,%rsi
  17:   48 89 c7                mov    %rax,%rdi
  1a:   e8 00 00 00 00          callq  1f <go+0x1f>
  1f:   90                      nop
  20:   c9                      leaveq 
  21:   c3                      retq   

Is there any reason for GCC to insert the 90/nop at 1f or is that just a side-effect that might happen when no optimizations are turned on?

Note: This question is different from most others because it asks about nop inside a function body, not an external padding.

Compiler versions tested: GCC Debian 5.3.1-14 (5.3.1) and Debian 6-20160313-1 (6.0.0)

like image 421
Thomas Luzat Avatar asked Apr 15 '16 11:04

Thomas Luzat


People also ask

What is the point of NOP assembly?

The NOP instruction does nothing. Execution continues with the next instruction. No registers or flags are affected by this instruction. NOP is typically used to generate a delay in execution or to reserve space in code memory.

How do I specify architecture in GCC?

If gcc -v shows GCC was configured with a --with-arch option (or --with-arch-32 and/or --with-arch-64 ) then that's what will be the default. Without a --with-arch option (and if there isn't a custom specs file in use) then the arch used will be the default for the target.

What assembly language does GCC use?

The GNU Assembler, commonly known as gas or as, is the assembler developed by the GNU Project. It is the default back-end of GCC. It is used to assemble the GNU operating system and the Linux kernel, and various other software.

What is asm volatile in C?

Example. asm("fsinx %1,%0" : "=f"(x) : "f"(a)); // Map the output operand on "x", // and the input operand on "a". C/C++ keyword: volatile. The volatile keyword is an implementation-dependent type qualifier, used when declaring variables, which prevents the compiler from optimizing those variables.


1 Answers

That's weird, I'd never noticed stray nops in the asm output at -O0 before. (Probably because I don't waste my time looking at un-optimized compiler output).

Usually nops inside functions are to align branch targets, including function entry points like in the question Brian linked. (Also see -falign-loops in the gcc docs, which is on by default at optimization levels other than -Os).


In this case, the nop is part of the compiler noise for a bare empty function:

void go(void) {
    //char name[64];
    //strcpy(name, data);
}
    push    rbp
    mov     rbp, rsp
    nop                     # only present for gcc5, not gcc 4.9.3
    pop     rbp
    ret

See that code in the Godbolt Compiler Explorer so you can check the asm for other compiler versions and compile options.

(Not technically noise, but -O0 enables -fno-omit-frame-pointer, and at -O0 even empty functions set up and tear down a stack frame.)


Of course, that nop is not present at any non-zero optimization level. There's no debugging or performance advantage to that nop in the code in the question. (See the performance guide links in the x86 tag wiki, esp. Agner Fog's microarchitecture guide to learn about what makes code fast on current CPUs.)

My guess is that it's purely an artifact of gcc internals. This nop is there as a nop in the gcc -S asm output, not as a .p2align directive. gcc itself doesn't count machine code bytes, it just uses alignment directives at certain points to align important branch targets. Only the assembler knows how big a nop is actually needed to reach the given alignment.

The default -O0 tells gcc that you want it to compile fast and not make good code. This means the asm output tells you more about gcc internals than other -O levels, and very little about how to optimize or anything else.

If you're trying to learn asm, it's more interesting to look at the code at -Og, for example (optimize for debugging).

If you're trying to see how well gcc or clang do at making code, you should look at -O3 -march=native (or -O2 -mtune=intel, or whatever settings you build your project with). Puzzling out the optimizations made at -O3 is a good way to learn some asm tricks, though. -fno-tree-vectorize is handy if you want to see a non-vectorized version of something fully optimized other than that.

like image 180
Peter Cordes Avatar answered Oct 21 '22 10:10

Peter Cordes