I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller. Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame: <blockquote> The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. </blockquote> In other words, it should be safe to assume, that for every entry point of called function: <code>16 | (%rsp + 8)</code> holds (extra eight is because <code>call</code> implicitely pushes return address on stack). <hr> How it looks in 32-bit world (assuming cdecl)? I noticed that <code>gcc</code> places the alignment inside the called function with following construct: <pre class="prettyprint"><code>and esp, -16 </code></pre> which seems to indicate, that is callee's responsibility. To put it clearer, consider following NASM code: <pre class="prettyprint"><code>global main extern printf extern scanf section .rodata s_fmt db "%d %d", 0 s_res db `%d with remainder %d\n`, 0 section .text main: start 0, 0 sub esp, 8 mov DWORD [ebp-4], 0 ; dividend mov DWORD [ebp-8], 0 ; divisor lea eax, [ebp-8] push eax lea eax, [ebp-4] push eax push s_fmt call scanf add esp, 12 mov eax, [ebp-4] cdq idiv DWORD [ebp-8] push edx push eax push s_res call printf xor eax, eax leave ret </code></pre> Is it required to align the stack before <code>scanf</code> is called? If so, then this would require to decrease <code>%esp</code> by four bytes before pushing these two arguments to <code>scanf</code> as: <pre class="prettyprint"><code>4 bytes (return address) 4 bytes (%ebp of previous stack frame) 8 bytes (for two variables) 12 bytes (three arguments for scanf) = 28 </code></pre>

GCC only does this extra stack alignment in <code>main</code>; that function is special. You won't see it if you look at code-gen for any other function, unless you have a local with <code>alignas(32)</code> or something. GCC is just taking a defensive approach with <code>-m32</code>, by not assuming that <code>main</code> is called with a properly 16B-aligned stack. Or this special treatment is left over from when <code>-mpreferred-stack-boundary=4</code> was only a good idea, not the law. The i386 System V ABI has guaranteed/required for years that ESP+4 is 16B-aligned on entry to a function. (i.e. ESP must be 16B-aligned before a CALL instruction, so args on the stack start at a 16B boundary. This is the same as for x86-64 System V.) The ABI also guarantees that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at <code>_start</code>, the ELF entry point, where ESP points at argc, not a return address), and the glibc CRT code maintains that alignment. As far as the calling convention is concerned, EBP is just another call-preserved register. But yes, compiler output with <code>-fno-omit-frame-pointer</code> does take care to <code>push ebp</code> before other call-preserved registers (like EBX) so the saved EBP values form a linked list. (Because it also does the <code>mov ebp, esp</code> part of setting up a frame pointer after that push.) <hr> Perhaps gcc is defensive because an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call <code>main</code> recursively). <hr> Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.) Look at compiler output on http://gcc.godbolt.org/ for main and functions other than main if you're curious. <hr> I just updated the ABI links in the x86 tag wiki the other day. http://x86-64.org/ is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links. Note that the last version on SCO's site is not the current revision, and doesn't include the 16B-stack-alignment requirement. I think some BSD versions still don't require / maintain 16-byte stack alignment.

Responsibility of stack alignment in 32-bit x86 assembly

Tags:

linux

x86

gcc

assembly

memory-alignment

I am trying to get a clear picture of who (caller or callee) is reponsible of stack alignment. The case for 64-bit assembly is rather clear, that it is by caller.

Referring to System V AMD64 ABI, section 3.2.2 The Stack Frame:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.

In other words, it should be safe to assume, that for every entry point of called function:

16 | (%rsp + 8)

holds (extra eight is because call implicitely pushes return address on stack).

How it looks in 32-bit world (assuming cdecl)? I noticed that gcc places the alignment inside the called function with following construct:

and esp, -16

which seems to indicate, that is callee's responsibility.

To put it clearer, consider following NASM code:

global main
extern printf
extern scanf
section .rodata
    s_fmt   db "%d %d", 0
    s_res   db `%d with remainder %d\n`, 0
section .text
main:
    start   0, 0
    sub     esp, 8
    mov     DWORD [ebp-4], 0 ; dividend
    mov     DWORD [ebp-8], 0 ; divisor

    lea     eax, [ebp-8]
    push    eax
    lea     eax, [ebp-4]
    push    eax
    push    s_fmt
    call    scanf
    add     esp, 12

    mov     eax, [ebp-4]
    cdq
    idiv    DWORD [ebp-8]

    push    edx
    push    eax
    push    s_res
    call    printf

    xor     eax, eax
    leave
    ret

Is it required to align the stack before scanf is called? If so, then this would require to decrease %esp by four bytes before pushing these two arguments to scanf as:

4 bytes (return address)
4 bytes (%ebp of previous stack frame)
8 bytes (for two variables)
12 bytes (three arguments for scanf)
= 28

304

asked Oct 28 '16 14:10

Grzegorz Szpetkowski

1 Answers

GCC only does this extra stack alignment in main; that function is special. You won't see it if you look at code-gen for any other function, unless you have a local with alignas(32) or something.

GCC is just taking a defensive approach with -m32, by not assuming that main is called with a properly 16B-aligned stack. Or this special treatment is left over from when -mpreferred-stack-boundary=4 was only a good idea, not the law.

The i386 System V ABI has guaranteed/required for years that ESP+4 is 16B-aligned on entry to a function. (i.e. ESP must be 16B-aligned before a CALL instruction, so args on the stack start at a 16B boundary. This is the same as for x86-64 System V.)

The ABI also guarantees that new 32-bit processes start with ESP aligned on a 16B boundary (e.g. at _start, the ELF entry point, where ESP points at argc, not a return address), and the glibc CRT code maintains that alignment.

As far as the calling convention is concerned, EBP is just another call-preserved register. But yes, compiler output with -fno-omit-frame-pointer does take care to push ebp before other call-preserved registers (like EBX) so the saved EBP values form a linked list. (Because it also does the mov ebp, esp part of setting up a frame pointer after that push.)

Perhaps gcc is defensive because an extremely ancient Linux kernel (from before that revision to the i386 ABI, when the required alignment was only 4B) could violate that assumption, and it's only an extra couple instructions that run once in the life-time of the process (assuming the program doesn't call main recursively).

Unlike gcc, clang assumes the stack is properly aligned on entry to main. (clang also assumes that narrow args have been sign or zero-extended to 32 bits, even though the current ABI revision doesn't specify that behaviour (yet). gcc and clang both emit code that does in the caller side, but only clang depends on it in the callee. This happens in 64-bit code, but I didn't check 32-bit.)

Look at compiler output on http://gcc.godbolt.org/ for main and functions other than main if you're curious.

I just updated the ABI links in the x86 tag wiki the other day. http://x86-64.org/ is still dead and seems to be not coming back, so I updated the System V links to point to the PDFs of the current revision in HJ Lu's github repo, and his page with links.

Note that the last version on SCO's site is not the current revision, and doesn't include the 16B-stack-alignment requirement.

I think some BSD versions still don't require / maintain 16-byte stack alignment.

answered Oct 07 '22 22:10

Peter Cordes

Related questions
                            
                                Understanding sendfile() and splice()
                            
                                How to suppress Perl warnings emitted from within a loaded module's code?
                            
                                How does the Linux kernel determine the order of __init calls?
                            
                                I need to implement a way to sleep this thread until it has work to do
                            
                                Timestamp outgoing packets
                            
                                How to calculate CPU utilization of a process & all its child processes in Linux?
                            
                                Receiving UDP broadcast packets on Linux
                            
                                Not checking close()'s return value: how serious, really?
                            
                                How do terminal size changes get sent to command line applications though ssh or telnet?
                            
                                Cross compiling "OpenSSL" Error
                            
                                What does 1c1 in diff tool mean?
                            
                                How can I understand these dreadful errors when building gcc-4.8.2?
                            
                                arm-linux-gnueabi toolchain vs arm-linux-androideabi toolchain.
                            
                                some uid's in /proc/pid/loginuid are strange
                            
                                Why does QCoreApplication call `setlocale(LC_ALL, "")` by default on Unix/Linux?
                            
                                How to delete/remove certificates from Mono certificate stores My and Trust?
                            
                                Can't connect to Vagrant using HeidiSQL: "Can't connect to MySQL server on 'localhost'"
                            
                                How to make python script press 'enter' when prompted on Shell
                            
                                How to get numeric keypad arrows working with java applications on Linux
                            
                                Linux executable can't find shared library in same folder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With