How is Stack memory allocated when using 'push' or 'sub' x86 instructions?

Tags:

I have been browsing for a while and I am trying to understand how memory is allocated to the stack when doing for example:

push rax

Or moving the stack pointer to allocate space for local variables of a subroutine:

sub rsp, X    ;Move stack pointer down by X bytes

What I understand is that the stack segment is anonymous in the virtual memory space,i.e., not file backed.

What I also understand is that the kernel will not actually map an anonymous virtual memory segment to physical memory until the program actually does something with that memory segment,i.e, write data. So, trying to read that segment before writing to it may cause an error.

In the first example the kernel will assign a frame page in physical memory if needed. In the second example I assume that the kernel will not assign any physical memory to the stack segment until the program actually writes data to an address in the stack stack segment.

Am I on the right track here?

706

asked Oct 17 '17 12:10

deftextra

2 Answers

yes, you're on the right track here, pretty much. sub rsp, X is kind of like "lazy" allocation: the kernel only does anything after a #PF page fault exception from touching memory above the new RSP, not just modifying registers. But you can still consider the memory "allocated", i.e. safe for use.

So, trying to read that segment before writing to it may cause an error.

No, read won't cause an error. Anonymous pages that have never been written are copy-on-write mapped to a/the physical zero page, whether they're in the BSS, stack, or mmap(MAP_ANONYMOUS).

Fun fact: in micro-benchmarks, make sure you write each page of memory for input arrays, otherwise you're actually looping over the same physical 4k or 2M page of zeros repeatedly and will get L1D cache hits even though you still get TLB misses (and soft page faults)! gcc will optimize malloc+memset(0) to calloc, but std::vector will actually write all the memory whether you want it to or not. memset on global arrays is not optimized out, so that works. (Or non-zero initialized arrays will be file-backed in the data segment.)

Note, I'm leaving out the difference between mapped vs. wired. i.e. whether an access will trigger a soft/minor page fault to update the page tables, or whether it's just a TLB miss and the hardware page-table walk will find a mapping (to the zero page).

But stack memory below RSP may not be mapped at all, so touching it without moving RSP first can be an invalid page fault instead of a "minor" page fault to sort out copy-on-write.

Stack memory has an interesting twist: The stack size limit is something like 8MB (ulimit -s), but in Linux the initial stack for the first thread of a process is special. For example, I set a breakpoint in _start in a hello-world (dynamically linked) executable, and looked at /proc/<PID>/smaps for it:

7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
Rss:                   8 kB
Pss:                   8 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         8 kB
Referenced:            8 kB
Anonymous:             8 kB
...

Only 8kiB of stack has been referenced and is backed by physical pages. That's expected, since the dynamic linker doesn't use a lot of stack.

Only 132kiB of stack is even mapped into the process's virtual address space. But special magic stops mmap(NULL, ...) from randomly choosing pages within the 8MiB of virtual address space that the stack could grow into.

Touching memory below the current stack mapping but within the stack limit causes the kernel to grow the stack mapping (in the page-fault handler).

(But only if rsp is adjusted first; the red-zone is only 128 bytes below rsp, so ulimit -s unlimited doesn't make touching memory 1GB below rsp grow the stack to there, but it will if you decrement rsp to there and then touch memory.)

This only applies to the initial/main thread's stack. pthreads just uses mmap(MAP_ANONYMOUS|MAP_STACK) to map an 8MiB chunk that can't grow. (MAP_STACK is currently a no-op.) So thread stacks can't grow after allocation (except manually with MAP_FIXED if there's space below them), and aren't affected by ulimit -s unlimited.

This magic preventing other things from choosing addresses in the stack-growth region doesn't exist for mmap(MAP_GROWSDOWN), so do not use it to allocate new thread stacks. (Otherwise you could end up with something using up the virtual address space below the new stack, leaving it unable to grow). Just allocate the full 8MiB. See also Where are the stacks for the other threads located in a process virtual address space?.

MAP_GROWSDOWN does have a grow-on-demand feature, described in the mmap(2) man page, but there's no growth limit (other than coming close to an existing mapping), so (according to the man page) it's based on a guard-page like Windows uses, not like the primary thread's stack.

Touching memory multiple pages below the bottom of a MAP_GROWSDOWN region might segfault (unlike with Linux's primary-thread stack). Compilers targeting Linux don't generate stack "probes" to make sure each 4k page is touched in order after a big allocation (e.g. local array or alloca), so that's another reason MAP_GROWSDOWN isn't safe for stacks.

Compilers do emit stack probes on Windows.

(MAP_GROWSDOWN might not even work at all, see @BeeOnRope's comment. It was never very safe to use for anything, because stack clash security vulnerabilities were possible if the mapping grows close to something else. So just don't use MAP_GROWSDOWN for anything ever. I'm leaving in the mention to describe the guard-page mechanism Windows uses, because it's interesting to know that Linux's primary-thread stack design isn't the only one possible.)

167

answered Oct 03 '22 23:10

Peter Cordes

Stack allocation uses same virtual memory mechanism which controls address access pagefault. I.e. if your current stack has 7ffd41ad2000-7ffd41af3000 as bounds:

myaut@panther:~> grep stack /proc/self/maps                                                     
7ffd41ad2000-7ffd41af3000 rw-p 00000000 00:00 0      [stack]

Then if CPU will try to read/write data at address 7ffd41ad1fff (1 byte before stack top boundary), it will generate a pagefault because OS didn't provide a corresponding chunk of allocated memory (page). So push or any other memory-accessing command with %rsp as address will trigger pagefault.

In the pagefault handler, kernel will check if stack can be grown and if so, it will allocate page backing faulty address (7ffd41ad1000-7ffd41ad2000) or trigger SIGSEGV if, say, stack ulimit is exceeded.

answered Oct 03 '22 23:10

myaut

Related questions
                            
                                Attempting to use execvpe(...) but get implicit declaration error - even though I think I'm using the correct argument types
                            
                                How to fix font rendering in PHPStorm 10 in Linux?
                            
                                Bash tries to execute commands in heredoc
                            
                                Number of executed Instructions different for Hello World program Nasm Assembly and C
                            
                                aapt missing but its there
                            
                                Using LD_PRELOAD mixed 64bit/32bit environment in Linux
                            
                                What is the best way to install OpenCv in Fedora?
                            
                                Base64 encode gives different result on linux CentOS terminal and in Java
                            
                                Getting started with wpa_supplicant using C
                            
                                how to store result of tail command in variable?
                            
                                How do I make shell scripts work on both Linux and OS X?
                            
                                How to set the socket option SO_REUSEPORT in Rust?
                            
                                Nougat 7.1.2 debug issue
                            
                                WRITE and READ memory mapped device registers in Linux on ARM
                            
                                Error: JAVA_HOME is set to an invalid directory: /usr/lib/jvm/java-8-oracle/jre/bin/java when i run gradle command in terminal
                            
                                sudo: command not found when I ssh into server
                            
                                can i monitor four servers from one single net-data instance?
                            
                                shared object library not found when running program, but it's linked during compiling
                            
                                Exclude a directory from find linux [duplicate]
                            
                                Linux (zip): how to find all files that are not readable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is Stack memory allocated when using 'push' or 'sub' x86 instructions?

Tags:

memory-management

linux

memory

x86-64

deftextra

People also ask

2 Answers

Peter Cordes

myaut

Recent Activity

Donate For Us