Why does GCC pad functions with NOPs?

Tags:

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:

int main(void)   {   int a = 0;   a += 1;   return 0;   }

The objdump disassembly has the code, but nops after the ret:

... 08048394 <main>:  8048394:       55                      push   %ebp  8048395:       89 e5                   mov    %esp,%ebp  8048397:       83 ec 10                sub    $0x10,%esp  804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)  80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)  80483a5:       b8 00 00 00 00          mov    $0x0,%eax  80483aa:       c9                      leave    80483ab:       c3                      ret      80483ac:       90                      nop  80483ad:       90                      nop  80483ae:       90                      nop  80483af:       90                      nop ...

From what I learned nops do nothing, and since after ret wouldn't even be executed.

My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?

I'd appreciate any help, just trying to learn.

819

asked Oct 27 '11 06:10

olly

2 Answers

First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:

-falign-functions
-falign-functions=n

Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance, -falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less.

-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.

Some assemblers only support this flag when n is a power of two; in that case, it is rounded up.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3.

There could be multiple reasons for doing this, but the main one on x86 is probably this:

Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be advantageous to align critical loop entries and subroutine entries by 16 in order to minimize the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.

(Quoted from "Optimizing subroutines in assembly language" by Agner Fog.)

edit: Here is an example that demonstrates the padding:

// align.c int f(void) { return 0; } int g(void) { return 0; }

When compiled using gcc 4.4.5 with default settings, I get:

align.o:     file format elf64-x86-64  Disassembly of section .text:  0000000000000000 <f>:    0:   55                      push   %rbp    1:   48 89 e5                mov    %rsp,%rbp    4:   b8 00 00 00 00          mov    $0x0,%eax    9:   c9                      leaveq     a:   c3                      retq     000000000000000b <g>:    b:   55                      push   %rbp    c:   48 89 e5                mov    %rsp,%rbp    f:   b8 00 00 00 00          mov    $0x0,%eax   14:   c9                      leaveq    15:   c3                      retq

Specifying -falign-functions gives:

align.o:     file format elf64-x86-64  Disassembly of section .text:  0000000000000000 <f>:    0:   55                      push   %rbp    1:   48 89 e5                mov    %rsp,%rbp    4:   b8 00 00 00 00          mov    $0x0,%eax    9:   c9                      leaveq     a:   c3                      retq       b:   eb 03                   jmp    10 <g>    d:   90                      nop    e:   90                      nop    f:   90                      nop  0000000000000010 <g>:   10:   55                      push   %rbp   11:   48 89 e5                mov    %rsp,%rbp   14:   b8 00 00 00 00          mov    $0x0,%eax   19:   c9                      leaveq    1a:   c3                      retq

132

answered Oct 12 '22 11:10

NPE

This is done to align the next function by 8, 16 or 32-byte boundary.

From “Optimizing subroutines in assembly language” by A.Fog:

11.5 Alignment of code

Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.

[...]

Aligning a subroutine entry is as simple as putting as many NOP 's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.

answered Oct 12 '22 12:10

hamstergene

Related questions
                            
                                Is there an elegant and fast way to test for the 1-bits in an integer to be in a contiguous region?
                            
                                How to get MAC address of your machine using a C program?
                            
                                Is it guaranteed to be safe to perform memcpy(0,0,0)?
                            
                                Double cast to unsigned int on Win32 is truncating to 2,147,483,648
                            
                                Why the strange indentation on switch statements?
                            
                                Recommended gcc warning options for C [closed]
                            
                                Replacing ld with gold - any experience?
                            
                                script/tool to convert file to C/C++ source code array
                            
                                Best timing method in C?
                            
                                srand() — why call it only once?
                            
                                undefined reference to `std::ios_base::Init::Init()'
                            
                                Why does modulus division (%) only work with integers?
                            
                                Converting an int into a 4 byte char array (C)
                            
                                difference between stdint.h and inttypes.h
                            
                                Why cast free's return value to void?
                            
                                How to get Selected Text from select2 when using <input>
                            
                                Generate a random float between 0 and 1
                            
                                This obfuscated C code claims to run without a main(), but what does it really do?
                            
                                C programming: How to program for Unicode?
                            
                                How, exactly, does the double-stringize trick work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does GCC pad functions with NOPs?

Tags:

c

gcc

assembly

memory-alignment

olly

People also ask

2 Answers

NPE

hamstergene

Recent Activity

Donate For Us