In the prologue of the main
function (of a simple toy program) that was compiled using gcc -g -o program -m32 program.c
on a 64-bit machine (running ubuntu 14.04) I get the following disassembly:
dump of assembler code for function main:
0x08048e24 <+0>: push %ebp
0x08048e25 <+1>: mov %esp,%ebp
0x08048e27 <+3>: and $0xfffffff0,%esp
...
What's the purpose of the instruction at <+3>?
That is, why should the $esp
point to an 16-aligned address?
Alignment is limiting addresses where data can be placed and is not limited to stacks. For example, 4-byte alignment would mean that all addresses have the lowest 2 bits always 0. The alignment often corresponds to the memory bus width in the hardware which can be several bytes wide.
"Stack alignment" just means the address of the stack (SP or ESP) is a multiple of the machine word size (so always divisible by 8 for 64-bit mode, 4 for 32-bit, 2 for 16-bit).
The compiler is maintaining a 16-byte alignment of the stack pointer when a function is called, adding padding to the stack as necessary. The compiler knows that the stack will always be aligned correctly, so it can emit instructions with alignment requirements without risk of triggering their fault conditions.
The stack pointer must always be aligned on a 16-byte boundary in AARCH64 mode. This instruction subtracts from the address in the frame pointer register and stores the result in register r3 , ready to be passed to the read function.
Modern versions of the i386 System V ABI have the same 16-byte stack alignment requirement / guarantee as x86-64 System V (which @ouah's answer mentions).
This includes a guarantee that the kernel will have aligned %esp
by 16 at _start
. So CRT startup code that also maintains 16-byte alignment will call main
with the stack 16-byte aligned.
Historically, the i386 System V ABI only required 4-byte stack alignment, and aligning the stack by 16 was just something compilers could choose to do; GCC defaulted to -mpreferred-stack-boundary=4
when it was just a good idea, not the law (on MacOS and Linux).
Some BSD versions I think still don't require 16-byte stack alignment in 32-bit code, so 32-bit code that want to use aligned memory for a double
, int64_t
, or especially an XMM vector, does need to manually align the stack instead of relying on incoming stack alignment.
But even on modern Linux, GCC's 32-bit-mode (-m32
) behaviour for main
doesn't assume that main
's caller (or the kernel) follows the ABI, and manually aligns the stack.
See Responsibility of stack alignment in 32-bit x86 assembly for more; another question where the obsolete instruction led to confusion based on the assumption that it was needed.
GCC on x86-64 does not do this, and does just take advantage of the fact that 16-byte stack alignment has always been a requirement in the x86-64 System V ABI. (And the Windows x64 ABI).
The System V AMD64 ABI (x86-64 ABI) requires 16-byte stack alignment. double
requires 8-byte alignment and SSE extensions require 16-byte alignment.
gcc
documentation points it in its documentation for -mpreferred-stack-boundary
option:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
Warning: When generating code for the x86-64 architecture with SSE extensions disabled, -mpreferred-stack-boundary=3 can be used to keep the stack boundary aligned to 8 byte boundary. Since x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and intended to be used in controlled environment where stack space is important limitation. This option leads to wrong code when functions compiled with 16 byte stack alignment (such as functions from a standard library) are called with misaligned stack. In this case, SSE instructions may lead to misaligned memory access traps. In addition, variable arguments are handled incorrectly for 16 byte aligned objects (including x87 long double and __int128), leading to wrong results. You must build all modules with -mpreferred-stack-boundary=3, including any libraries. This includes the system libraries and startup modules.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With