I'm reading Intel manual about Stack Frames. It was noted that <blockquote> The end of the input argument area shall be aligned on a 16 (32, if <code>__m256</code> is passed on stack) byte boundary. </blockquote> I don't quite understand what it means. Does it mean that <code>rsp</code> should point to the address that is always aligned on 16? I tried to experiment with it and wrote very simple program: <pre class="prettyprint"><code>section .text global _start _start: push byte 0xFF ;SYS_exit syscall </code></pre> I ran it with <code>gdb</code> and noted that before executing the <code>push</code> instruction <code>rsp = 0x7fffffffdcf0</code>. And it was really aligned on 16. <code>x/1xg $rsp</code> returned <code>0x0000000000000001</code>. Now, after pushing the content of <code>rsp</code> became <code>0x7fffffffdce8</code>. Is it a violation of the alignment requirements? And what I also noticed <code>x/1xg $rsp</code> returned <code>0xffffffffffffffff</code>. It means we set <code>1</code> to the next 8 bytes, not just one specified in the push instruction. Why? I expected the output of <code>x/1xg $rsp</code> after pushing to be <code>0x00000000000000FF</code> (we pushed just one byte).

<code>rsp % 16 == 0</code> at <code>_start</code> - that's the OS entry point. It's not a function (there's no return address on the stack, instead RSP points at <code>argc</code>). Unlike functions, RSP is aligned by 16 on entry to <code>_start</code>, as specified by the x86-64 System V ABI. From <code>_start</code>, you're ready to call a function right away, without having to adjust the stack, because the stack should be aligned before <code>call</code>. <code>call</code> itself will add 8B of return address, and you can expect the <code>rsp % 16 == 8</code> upon entry, one more push away from 16-byte alignment. That's guaranteed upon entry to any function1. Upon app entry, you can trust the kernel to give you 16-byte RSP alignment, or you could align the stack manually with <code>and rsp, -16</code> before calling any other code conforming to ABI. (Or if you plan to use C runtime lib, then the entry point of your app code should be <code>main</code>, and let libc's crt startup code code run as <code>_start</code>. <code>main</code> is a normal function like any other, so RSP & 0xF == 0x8 on entry to it when it's eventually called.) Footnote 1: Unless you build with special options that change the ABI, like <code>-mpreferred-stack-boundary=3</code> instead of the default <code>4</code>. But that would make it unsafe to call functions in any code compiled without that. For example glibc scanf Segmentation faults when called from a function that doesn't align RSP <hr> <blockquote> Now, after pushing the content of <code>rsp</code> became <code>0x7fffffffdce8</code>. Is it a violation of the alignment requirements? </blockquote> Yes, if you would at that point <code>call</code> some more complex function like for example <code>printf</code> with non trivial arguments (so it would use SSE instruction for implementation), it will highly likely segfault. <hr> About <code>push byte 0xFF</code>: That's not legal instruction in 64b mode (not even in 16 and 32 bit modes) (not legal in the sense of <code>byte</code> operand target size, <code>byte</code> immediate as source value is legal, but operand size can be only 16, 32 or 64 bits), so the NASM will guess the target size (any from legal ones, naturally picking <code>qword</code> in 64b mode), and use the guessed target size with the <code>imm8</code> from source. BTW use <code>-w+all</code> option to make the NASM emit (sort of weird, but at least you can investigate) warning in such case: <pre class="prettyprint"><code>warning: signed byte value exceeds bounds </code></pre> For example legit <code>push word 0xFF</code> would push only two bytes to stack, of word value <code>0x00FF</code>. <hr> How to align the stack: if you already know initial alignment, just adjust as needed before calling some ABI requiring subroutine (in common 64b code that is usually as simple as either not pushing anything, or doing one more redundant push, like <code>push rbp</code>). If you are not sure about alignment, use some spare register to store original <code>rsp</code> (often <code>rbp</code> is used, so it also functions as stack frame pointer), and then <code>and rsp,-16</code> to clear the bottom bits. Keep in mind, when creating your own ABI conforming subroutines, that stack was aligned before <code>call</code>, so it is -8B upon entry. Again simple <code>push rbp</code> is often enough to resolve several issues at the same time, preserving <code>rbp</code> value (so <code>mov rbp, rsp</code> is possible "for free") and aligning stack for rest of subroutine. <hr> EDIT: about encoding, source size, and immediate size... Unfortunately I'm not 100% sure about how exactly this is supposed to be defined in NASM, but I think actually the <code>push</code> definition is so complex, that it breaks NASM syntax a bit (exhausting the current syntax to a point where you can't specify whether you mean operand size, or source immediate size, so it is silently assumed the size specifier is operand size mainly and affects immediate in certain cases). By using <code>push byte 0xFF</code> the NASM will take the <code>byte</code> part ALSO as "operand size", not just as immediate size. And <code>byte</code> is not legal operand size for push, so NASM will instead choose <code>qword</code> as by default in 64b mode. Then it will also consider the <code>byte</code> as immediate size, and sign-extend the <code>0xFF</code> to <code>qword</code>. I.e. this looks to me as a bit of undefined behaviour. NASM creators probably don't expect you to specify immediate size, because the NASM optimizes for size, so when you do <code>push word -1</code>, it will assemble that as "push word operand imm8". You can override that the other way, to make sure you get imm16 by <code>push strict word -1</code>. See the machine code produced by the various combinations (in 64b mode) (some of them speaking strictly are worth at least of warning, or even error, like "strict qword" producing only imm32, not imm64 (as imm64 opcode does not exist of course) ... not even mentioning that the <code>dword</code> variants are effectively <code>qword</code> operand sizes, you can't use 32b operand size in 64b mode): <pre class="prettyprint"><code> 6 00000000 6AFF push -1 7 00000002 6AFF push strict byte 0xFF 8 ****************** warning: signed byte value exceeds bounds 9 00000004 6AFF push byte 0xFF 10 ****************** warning: signed byte value exceeds bounds 11 00000006 6AFF push strict byte -1 12 00000008 6AFF push byte -1 13 0000000A 6668FF00 push strict word 0xFF 14 0000000E 6668FF00 push word 0xFF 15 00000012 6668FFFF push strict word -1 16 00000016 666AFF push word -1 17 00000019 68FF000000 push strict dword 0xFF 18 0000001E 68FF000000 push dword 0xFF 19 00000023 68FFFFFFFF push strict dword -1 20 00000028 6AFF push dword -1 21 0000002A 68FF000000 push strict qword 0xFF 22 0000002F 68FF000000 push qword 0xFF 23 00000034 68FFFFFFFF push strict qword -1 24 00000039 6AFF push qword -1 </code></pre> Anyway, I guess not too many people are bothered by this, as in 64b mode you usually want qword push (<code>rsp -= 8</code>) with immediate encoded in shortest possible way, so you just write <code>push -1</code> and let the NASM handle the <code>imm8</code> optimization itself, expecting <code>rsp</code> to change by -8 of course. And in other case, they probably expect you to know legal operand sizes, and not to use <code>byte</code> at all. If you think this is not acceptable, I would raise this on the NASM forum/bugzilla/somewhere, how it is supposed to work exactly. As far as I'm personally concerned, the current behaviour is "good enough" for me (makes both sense, plus I give quick look to listing file from time to time to verify there's no nasty surprise in the machine code bytes and it landed as expected). That said, I mostly code size intros, so I know about every byte produced and it's purpose. If the NASM would suddenly produce <code>imm16</code> instead of expected <code>imm8</code>, I would see it on the binary size and investigate.

Understanding stack alignment

Tags:

assembly

x86-64

abi

memory-alignment

calling-convention

I'm reading Intel manual about Stack Frames. It was noted that

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.

I don't quite understand what it means. Does it mean that rsp should point to the address that is always aligned on 16?

I tried to experiment with it and wrote very simple program:

section .text
    global _start

_start:
    push byte 0xFF

    ;SYS_exit syscall

I ran it with gdb and noted that before executing the push instruction rsp = 0x7fffffffdcf0. And it was really aligned on 16. x/1xg $rsp returned 0x0000000000000001.

Now, after pushing the content of rsp became 0x7fffffffdce8. Is it a violation of the alignment requirements?

And what I also noticed x/1xg $rsp returned 0xffffffffffffffff. It means we set 1 to the next 8 bytes, not just one specified in the push instruction. Why? I expected the output of x/1xg $rsp after pushing to be 0x00000000000000FF (we pushed just one byte).

965

asked Feb 08 '18 11:02

St.Antario

1 Answers

rsp % 16 == 0 at _start - that's the OS entry point. It's not a function (there's no return address on the stack, instead RSP points at argc). Unlike functions, RSP is aligned by 16 on entry to _start, as specified by the x86-64 System V ABI.

From _start, you're ready to call a function right away, without having to adjust the stack, because the stack should be aligned before call. call itself will add 8B of return address, and you can expect the rsp % 16 == 8 upon entry, one more push away from 16-byte alignment. That's guaranteed upon entry to any function¹.

Upon app entry, you can trust the kernel to give you 16-byte RSP alignment, or you could align the stack manually with and rsp, -16 before calling any other code conforming to ABI. (Or if you plan to use C runtime lib, then the entry point of your app code should be main, and let libc's crt startup code code run as _start. main is a normal function like any other, so RSP & 0xF == 0x8 on entry to it when it's eventually called.)

Footnote 1: Unless you build with special options that change the ABI, like -mpreferred-stack-boundary=3 instead of the default 4. But that would make it unsafe to call functions in any code compiled without that. For example glibc scanf Segmentation faults when called from a function that doesn't align RSP

Now, after pushing the content of rsp became 0x7fffffffdce8. Is it a violation of the alignment requirements?

Yes, if you would at that point call some more complex function like for example printf with non trivial arguments (so it would use SSE instruction for implementation), it will highly likely segfault.

About push byte 0xFF:

That's not legal instruction in 64b mode (not even in 16 and 32 bit modes) (not legal in the sense of byte operand target size, byte immediate as source value is legal, but operand size can be only 16, 32 or 64 bits), so the NASM will guess the target size (any from legal ones, naturally picking qword in 64b mode), and use the guessed target size with the imm8 from source.

BTW use -w+all option to make the NASM emit (sort of weird, but at least you can investigate) warning in such case:

warning: signed byte value exceeds bounds

For example legit push word 0xFF would push only two bytes to stack, of word value 0x00FF.

How to align the stack: if you already know initial alignment, just adjust as needed before calling some ABI requiring subroutine (in common 64b code that is usually as simple as either not pushing anything, or doing one more redundant push, like push rbp).

If you are not sure about alignment, use some spare register to store original rsp (often rbp is used, so it also functions as stack frame pointer), and then and rsp,-16 to clear the bottom bits.

Keep in mind, when creating your own ABI conforming subroutines, that stack was aligned before call, so it is -8B upon entry. Again simple push rbp is often enough to resolve several issues at the same time, preserving rbp value (so mov rbp, rsp is possible "for free") and aligning stack for rest of subroutine.

EDIT: about encoding, source size, and immediate size...

Unfortunately I'm not 100% sure about how exactly this is supposed to be defined in NASM, but I think actually the push definition is so complex, that it breaks NASM syntax a bit (exhausting the current syntax to a point where you can't specify whether you mean operand size, or source immediate size, so it is silently assumed the size specifier is operand size mainly and affects immediate in certain cases).

By using push byte 0xFF the NASM will take the byte part ALSO as "operand size", not just as immediate size. And byte is not legal operand size for push, so NASM will instead choose qword as by default in 64b mode. Then it will also consider the byte as immediate size, and sign-extend the 0xFF to qword. I.e. this looks to me as a bit of undefined behaviour. NASM creators probably don't expect you to specify immediate size, because the NASM optimizes for size, so when you do push word -1, it will assemble that as "push word operand imm8". You can override that the other way, to make sure you get imm16 by push strict word -1.

See the machine code produced by the various combinations (in 64b mode) (some of them speaking strictly are worth at least of warning, or even error, like "strict qword" producing only imm32, not imm64 (as imm64 opcode does not exist of course) ... not even mentioning that the dword variants are effectively qword operand sizes, you can't use 32b operand size in 64b mode):

 6 00000000 6AFF                            push    -1
 7 00000002 6AFF                            push    strict byte 0xFF
 8          ******************       warning: signed byte value exceeds bounds
 9 00000004 6AFF                            push    byte 0xFF
10          ******************       warning: signed byte value exceeds bounds
11 00000006 6AFF                            push    strict byte -1
12 00000008 6AFF                            push    byte -1
13 0000000A 6668FF00                        push    strict word 0xFF
14 0000000E 6668FF00                        push    word 0xFF
15 00000012 6668FFFF                        push    strict word -1
16 00000016 666AFF                          push    word -1
17 00000019 68FF000000                      push    strict dword 0xFF
18 0000001E 68FF000000                      push    dword 0xFF
19 00000023 68FFFFFFFF                      push    strict dword -1
20 00000028 6AFF                            push    dword -1
21 0000002A 68FF000000                      push    strict qword 0xFF
22 0000002F 68FF000000                      push    qword 0xFF
23 00000034 68FFFFFFFF                      push    strict qword -1
24 00000039 6AFF                            push    qword -1

Anyway, I guess not too many people are bothered by this, as in 64b mode you usually want qword push (rsp -= 8) with immediate encoded in shortest possible way, so you just write push -1 and let the NASM handle the imm8 optimization itself, expecting rsp to change by -8 of course. And in other case, they probably expect you to know legal operand sizes, and not to use byte at all.

If you think this is not acceptable, I would raise this on the NASM forum/bugzilla/somewhere, how it is supposed to work exactly. As far as I'm personally concerned, the current behaviour is "good enough" for me (makes both sense, plus I give quick look to listing file from time to time to verify there's no nasty surprise in the machine code bytes and it landed as expected). That said, I mostly code size intros, so I know about every byte produced and it's purpose. If the NASM would suddenly produce imm16 instead of expected imm8, I would see it on the binary size and investigate.

152

answered Oct 05 '22 23:10

Ped7g

Related questions
                            
                                Minimum Number of Bits Required for Two's Complement Form
                            
                                Having trouble determining constants in this assembly code
                            
                                Counting number of instructions executed by a binary using pin, perf and valgrind
                            
                                Local labels in GNU assembler; gdb printing backtrace as though labels are functions
                            
                                Why does the 80x87 instruction set use a "stack-based" design?
                            
                                Cortex-M0+ Linker Script and Startup Code
                            
                                X86_64 - assembly - Why displacement not 64 bits?
                            
                                How 32 bit IR hold load instruction?(RISC style 32bit architechture)
                            
                                x86 memory access segmentation fault
                            
                                Why asm have impossible constraints when I name registers?
                            
                                Exercise in self modifying memory copy routine, 6502 ASM
                            
                                Can a movss instruction be used to replace integer data?
                            
                                Assembler Error: expression too complex
                            
                                INT 10h/ah=13h doesn't print strings when part of second stage bootloader
                            
                                How to set gcc to use intel syntax permanently?
                            
                                x86-64: canonical addresses and actual available range
                            
                                MIPS Assembly Alignment Align n
                            
                                What is the function of a "data label" in an x86 assembler?
                            
                                Exec error when writing ELF64 from scratch
                            
                                NASM and 8-bit memory offset confusion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With