I'm reading Intel manual about Stack Frames. It was noted that
The end of the input argument area shall be aligned on a 16 (32, if
__m256
is passed on stack) byte boundary.
I don't quite understand what it means. Does it mean that rsp
should point to the address that is always aligned on 16?
I tried to experiment with it and wrote very simple program:
section .text
global _start
_start:
push byte 0xFF
;SYS_exit syscall
I ran it with gdb
and noted that before executing the push
instruction rsp = 0x7fffffffdcf0
. And it was really aligned on 16. x/1xg $rsp
returned 0x0000000000000001
.
Now, after pushing the content of rsp
became 0x7fffffffdce8
. Is it a violation of the alignment requirements?
And what I also noticed x/1xg $rsp
returned 0xffffffffffffffff
. It means we set 1
to the next 8 bytes, not just one specified in the push instruction. Why? I expected the output of x/1xg $rsp
after pushing to be 0x00000000000000FF
(we pushed just one byte).
An aligned access is an operation where a word-aligned address is used for a word, dual word, or multiple word access, or where a halfword-aligned address is used for a halfword access. Byte accesses are always aligned.
Certain SIMD instructions, which perform the same instruction on multiple data, require that the memory address of this data is aligned to a certain byte boundary. This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction.
rsp % 16 == 0
at _start
- that's the OS entry point. It's not a function (there's no return address on the stack, instead RSP points at argc
).
Unlike functions, RSP is aligned by 16 on entry to _start
, as specified by the x86-64 System V ABI.
From _start
, you're ready to call a function right away, without having to adjust the stack, because the stack should be aligned before call
. call
itself will add 8B of return address, and you can expect the rsp % 16 == 8
upon entry, one more push away from 16-byte alignment. That's guaranteed upon entry to any function1.
Upon app entry, you can trust the kernel to give you 16-byte RSP alignment, or you could align the stack manually with and rsp, -16
before calling any other code conforming to ABI. (Or if you plan to use C runtime lib, then the entry point of your app code should be main
, and let libc's crt startup code code run as _start
. main
is a normal function like any other, so RSP & 0xF == 0x8 on entry to it when it's eventually called.)
Footnote 1: Unless you build with special options that change the ABI, like -mpreferred-stack-boundary=3
instead of the default 4
. But that would make it unsafe to call functions in any code compiled without that. For example glibc scanf Segmentation faults when called from a function that doesn't align RSP
Now, after pushing the content of
rsp
became0x7fffffffdce8
. Is it a violation of the alignment requirements?
Yes, if you would at that point call
some more complex function like for example printf
with non trivial arguments (so it would use SSE instruction for implementation), it will highly likely segfault.
About push byte 0xFF
:
That's not legal instruction in 64b mode (not even in 16 and 32 bit modes) (not legal in the sense of byte
operand target size, byte
immediate as source value is legal, but operand size can be only 16, 32 or 64 bits), so the NASM will guess the target size (any from legal ones, naturally picking qword
in 64b mode), and use the guessed target size with the imm8
from source.
BTW use -w+all
option to make the NASM emit (sort of weird, but at least you can investigate) warning in such case:
warning: signed byte value exceeds bounds
For example legit push word 0xFF
would push only two bytes to stack, of word value 0x00FF
.
How to align the stack: if you already know initial alignment, just adjust as needed before calling some ABI requiring subroutine (in common 64b code that is usually as simple as either not pushing anything, or doing one more redundant push, like push rbp
).
If you are not sure about alignment, use some spare register to store original rsp
(often rbp
is used, so it also functions as stack frame pointer), and then and rsp,-16
to clear the bottom bits.
Keep in mind, when creating your own ABI conforming subroutines, that stack was aligned before call
, so it is -8B upon entry. Again simple push rbp
is often enough to resolve several issues at the same time, preserving rbp
value (so mov rbp, rsp
is possible "for free") and aligning stack for rest of subroutine.
EDIT: about encoding, source size, and immediate size...
Unfortunately I'm not 100% sure about how exactly this is supposed to be defined in NASM, but I think actually the push
definition is so complex, that it breaks NASM syntax a bit (exhausting the current syntax to a point where you can't specify whether you mean operand size, or source immediate size, so it is silently assumed the size specifier is operand size mainly and affects immediate in certain cases).
By using push byte 0xFF
the NASM will take the byte
part ALSO as "operand size", not just as immediate size. And byte
is not legal operand size for push, so NASM will instead choose qword
as by default in 64b mode. Then it will also consider the byte
as immediate size, and sign-extend the 0xFF
to qword
. I.e. this looks to me as a bit of undefined behaviour. NASM creators probably don't expect you to specify immediate size, because the NASM optimizes for size, so when you do push word -1
, it will assemble that as "push word operand imm8". You can override that the other way, to make sure you get imm16 by push strict word -1
.
See the machine code produced by the various combinations (in 64b mode) (some of them speaking strictly are worth at least of warning, or even error, like "strict qword" producing only imm32, not imm64 (as imm64 opcode does not exist of course) ... not even mentioning that the dword
variants are effectively qword
operand sizes, you can't use 32b operand size in 64b mode):
6 00000000 6AFF push -1
7 00000002 6AFF push strict byte 0xFF
8 ****************** warning: signed byte value exceeds bounds
9 00000004 6AFF push byte 0xFF
10 ****************** warning: signed byte value exceeds bounds
11 00000006 6AFF push strict byte -1
12 00000008 6AFF push byte -1
13 0000000A 6668FF00 push strict word 0xFF
14 0000000E 6668FF00 push word 0xFF
15 00000012 6668FFFF push strict word -1
16 00000016 666AFF push word -1
17 00000019 68FF000000 push strict dword 0xFF
18 0000001E 68FF000000 push dword 0xFF
19 00000023 68FFFFFFFF push strict dword -1
20 00000028 6AFF push dword -1
21 0000002A 68FF000000 push strict qword 0xFF
22 0000002F 68FF000000 push qword 0xFF
23 00000034 68FFFFFFFF push strict qword -1
24 00000039 6AFF push qword -1
Anyway, I guess not too many people are bothered by this, as in 64b mode you usually want qword push (rsp -= 8
) with immediate encoded in shortest possible way, so you just write push -1
and let the NASM handle the imm8
optimization itself, expecting rsp
to change by -8 of course. And in other case, they probably expect you to know legal operand sizes, and not to use byte
at all.
If you think this is not acceptable, I would raise this on the NASM forum/bugzilla/somewhere, how it is supposed to work exactly. As far as I'm personally concerned, the current behaviour is "good enough" for me (makes both sense, plus I give quick look to listing file from time to time to verify there's no nasty surprise in the machine code bytes and it landed as expected). That said, I mostly code size intros, so I know about every byte produced and it's purpose. If the NASM would suddenly produce imm16
instead of expected imm8
, I would see it on the binary size and investigate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With