Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux process stack overrun by local variables (stack guarding)

From What is the purpose of the _chkstk() function?:

At the end of the stack, there is one guard page mapped as inaccessible memory -- if the program accesses it (because it is trying to use more stack than is currently mapped), there's an access violation.

_chkstk() is a special compiler-helper function which

ensures that there is enough space for the local variables

i.e. it's doing some stack probing (here is an LLVM example).
This case is Windows-specific. So Windows has some solution to the problem.

Let's consider the similar conditions under Linux (or some other Unix-like): we have a lot of function's local variables. The first stack variable access is behind the stack segment (e.g. mov eax, [esp-LARGE_NUMBER], here esp-LARGE_NUMBER is something behind the stack segment). Is there any features to prevent possible page fault or whatever in Linux (perhaps other Unix-like) or development tools like gcc, clang, etc? Does -fstack-check (GCC stack checking) somehow solve this problem? This answer states that it is something very similar to _chkstk().

P.S. These posts 1, 2 didn't help a lot.

P.P.S. In general, the question is about implementation differences between OSs (foremost Linux vs Windows) approaches of struggling with huge amount of stack variables, that climb behind the stack segment. Both C++ and C tags are added because it's about Linux native binary producing, but the assembly code is compiler-related.

like image 453
narotello Avatar asked Feb 04 '20 13:02

narotello


Video Answer


1 Answers

_chkstk does stack probes to make sure each page is touched in order after a (potentially) large allocation, e.g. an alloca. Because Windows will only grow the stack one page at a time up to the stack size limit.

Touching that "guard page" triggers stack growth. It doesn't guard against stack overflow; I think you're misinterpreting the meaning of "guard page" in this usage.

The function name is also potentially misleading. _chkstk docs simply say: Called by the compiler when you have more than one page of local variables in your function. It doesn't truly check anything, it just makes sure that intervening pages have been touched before memory around esp/rsp gets used. i.e. the only possible effects are: nothing (possibly including a valid soft page fault) or an invalid page-fault on stack overflow (trying to touch a page that Windows refused to grow the stack to include.) It ensures that the stack pages are allocated by unconditionally writing them.

I guess you could look at this as checking for a stack clash by making sure you touch an unmappable page before continuing in the case of stack overflow.


Linux will grow the main-thread stack1 by any number of pages (up to the stack size limit set by ulimit -s; default 8MiB) when you touch memory below old stack pages if it's above the current stack pointer.

If you touch memory outside the growth limit, or don't move the stack pointer first, it will just segfault. Thus Linux doesn't need stack probes, merely to move the stack pointer by as many bytes as you want to reserve. Compilers know this and emit code accordingly.

See also How is Stack memory allocated when using 'push' or 'sub' x86 instructions? for more low-level details on what the Linux kernel does, and what glibc pthreads on Linux does.

A sufficiently large alloca on Linux can move the stack all the way past the bottom of the stack growth region, beyond the guard pages below that, and into another mapping; this is a Stack Clash. https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash It of course requires that the program uses a potentially-huge size for alloca, dependent on user input. The mitigation for CVE-2017-1000364 is to leave a 1MiB guard region, requiring a much larger alloca than normal to get past the guard pages.

This 1MiB guard region is below the ulimit -s (8MiB) growth limit, not below the current stack pointer. It's separate from Linux's normal stack growth mechanism.


gcc -fstack-check

The effect of gcc -fstack-check is essentially the same as what's always needed on Windows (which MSVC does by calling _chkstk): touch stack pages in between previous and new stack pointer when moving it by a large or runtime-variable amount.

But the purpose / benefit of these probes is different on Linux; it's never needed for correctness in a bug-free program on GNU/Linux. It "only" defends against stack-clash bugs/exploits.

On x86-64 GNU/Linux, gcc -fstack-check will (for functions with a VLA or large fixe-size array) add a loop that does stack probes with or qword ptr [rsp], 0 along with sub rsp,4096. For known fixed array sizes, it can be just a single probe. The code-gen doesn't look very efficient; it's normally never used on this target. (Godbolt compiler explorer example that passes a stack array to a non-inline function.)

https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html describes some GCC internal parameters that control what -fstack-check does.

If you want absolute safety against stack-clash attacks, this should do it. It's not needed for normal operation, though, and a 1MiB guard page is enough for most people.


Note that -fstack-protector-strong is completely different, and guards against overwrite of the return address by buffer overruns on local arrays. Nothing to do with stack clashes, and the attack is against stuff already on the stack above a small local array, not against other regions of memory by moving the stack a lot.


Footnote 1: Thread stacks on Linux (for threads other than the initial one) have to be fully allocated up front because the magic growth feature doesn't work. Only the initial aka main thread of a process can have that.

(There's an mmap(MAP_GROWSDOWN) feature but it's not safe because there's no limit, and because nothing stops other dynamic allocations from randomly picking a page close below the current stack, limiting future growth to a tiny size before a stack clash. Also because it only grows if you touch the guard page, so it would need stack probes. For these showstopper reasons, MAP_GROWSDOWN is not used for thread stacks. The internal mechanism for the main stack relies on different magic in the kernel which does prevent other allocations from stealing space.)

like image 107
Peter Cordes Avatar answered Oct 13 '22 10:10

Peter Cordes