Why does GCC on x86-64 insert a NOP inside of a function?

Tags:

Given the following C function:

void go(char *data) {
    char name[64];
    strcpy(name, data);
}

GCC 5 and 6 on x86-64 compile (plain gcc -c -g -o followed by objdump) this to:

0000000000000000 <go>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 50             sub    $0x50,%rsp
   8:   48 89 7d b8             mov    %rdi,-0x48(%rbp)
   c:   48 8b 55 b8             mov    -0x48(%rbp),%rdx
  10:   48 8d 45 c0             lea    -0x40(%rbp),%rax
  14:   48 89 d6                mov    %rdx,%rsi
  17:   48 89 c7                mov    %rax,%rdi
  1a:   e8 00 00 00 00          callq  1f <go+0x1f>
  1f:   90                      nop
  20:   c9                      leaveq 
  21:   c3                      retq

Is there any reason for GCC to insert the 90/nop at 1f or is that just a side-effect that might happen when no optimizations are turned on?

Note: This question is different from most others because it asks about nop inside a function body, not an external padding.

Compiler versions tested: GCC Debian 5.3.1-14 (5.3.1) and Debian 6-20160313-1 (6.0.0)

421

asked Apr 15 '16 11:04

Thomas Luzat

1 Answers

That's weird, I'd never noticed stray nops in the asm output at -O0 before. (Probably because I don't waste my time looking at un-optimized compiler output).

Usually nops inside functions are to align branch targets, including function entry points like in the question Brian linked. (Also see -falign-loops in the gcc docs, which is on by default at optimization levels other than -Os).

In this case, the nop is part of the compiler noise for a bare empty function:

void go(void) {
    //char name[64];
    //strcpy(name, data);
}
    push    rbp
    mov     rbp, rsp
    nop                     # only present for gcc5, not gcc 4.9.3
    pop     rbp
    ret

See that code in the Godbolt Compiler Explorer so you can check the asm for other compiler versions and compile options.

(Not technically noise, but -O0 enables -fno-omit-frame-pointer, and at -O0 even empty functions set up and tear down a stack frame.)

Of course, that nop is not present at any non-zero optimization level. There's no debugging or performance advantage to that nop in the code in the question. (See the performance guide links in the x86 tag wiki, esp. Agner Fog's microarchitecture guide to learn about what makes code fast on current CPUs.)

My guess is that it's purely an artifact of gcc internals. This nop is there as a nop in the gcc -S asm output, not as a .p2align directive. gcc itself doesn't count machine code bytes, it just uses alignment directives at certain points to align important branch targets. Only the assembler knows how big a nop is actually needed to reach the given alignment.

The default -O0 tells gcc that you want it to compile fast and not make good code. This means the asm output tells you more about gcc internals than other -O levels, and very little about how to optimize or anything else.

If you're trying to learn asm, it's more interesting to look at the code at -Og, for example (optimize for debugging).

If you're trying to see how well gcc or clang do at making code, you should look at -O3 -march=native (or -O2 -mtune=intel, or whatever settings you build your project with). Puzzling out the optimizations made at -O3 is a good way to learn some asm tricks, though. -fno-tree-vectorize is handy if you want to see a non-vectorized version of something fully optimized other than that.

180

answered Oct 21 '22 10:10

Peter Cordes

Related questions
                            
                                Implement piping ("|") using C..(fork used)
                            
                                C: Regex library with MinGW
                            
                                Testing for builtins/intrinsics
                            
                                gcc functions with constructor attribute are not being linked
                            
                                What (working) alternate toolchains exist for x86 C++ development on linux? [closed]
                            
                                How to find out which functions were NOT inlined
                            
                                How to enable optimization in G++ with #pragma
                            
                                undefined reference to `_GetAdaptersAddresses@20' - but I included -liphlpapi
                            
                                A bug in GCC implementation of bit-fields
                            
                                cmake problems in Windows
                            
                                force inline function in other translation unit
                            
                                Trouble understanding a simple shell script
                            
                                Is it possible to install g++ on CentOS without root?
                            
                                Remove note of GCC ABI change
                            
                                Cython: Compile Option -O3
                            
                                What to use instead of mudflap with gcc/llvm (for detecting memory access bugs)?
                            
                                GnuCOBOL failing to find dynamic symbols, only on recent Ubuntu
                            
                                gcc can compile a variadic template while clang cannot
                            
                                compile 32bit code from cygwin64
                            
                                What does -pie do exactly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does GCC on x86-64 insert a NOP inside of a function?

Tags:

gcc

assembly

x86-64

nop

Thomas Luzat

People also ask

1 Answers

Peter Cordes

Recent Activity

Donate For Us