For a 32-bit windows application is it valid to use stack memory below ESP for temporary swap space without explicitly decrementing ESP?
Consider a function that returns a floating point value in ST(0)
. If our value is currently in EAX we would, for example,
PUSH EAX
FLD [ESP]
ADD ESP,4 // or POP EAX, etc
// return...
Or without modifying the ESP register, we could just :
MOV [ESP-4], EAX
FLD [ESP-4]
// return...
In both cases the same thing happens except that in the first case we take care to decrement the stack pointer before using the memory, and then to increment it afterwards. In the latter case we do not.
Notwithstanding any real need to persist this value on the stack (reentrancy issues, function calls between PUSH
ing and reading the value back, etc) is there any fundamental reason why writing to the stack below ESP like this would be invalid?
The x86 CPU has eight general purpose registers: eax , ebx , ecx , edx , esp , ebp , esi , and edi . These registers are 32 bits (4 bytes) in size. A program can access registers as 32-bit (4 bytes), 16-bit (2 bytes), or 8-bit (1 byte) values.
The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. And pop : The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.
TL:DR: no, there are some SEH corner cases that can make it unsafe in practice, as well as being documented as unsafe. @Raymond Chen recently wrote a blog post that you should probably read instead of this answer.
His example of a code-fetch page-fault I/O error that can be "fixed" by prompting the user to insert a CD-ROM and retry is also my conclusion for the only practically-recoverable fault if there aren't any other possibly-faulting instructions between store and reload below ESP/RSP.
Or if you ask a debugger to call a function in the program being debugged, it will also use the target process's stack.
This answer has a list of some things you'd think would potentially step on memory below ESP, but actually don't, which might be interesting. It seems to be only SEH and debuggers that can be a problem in practice.
First of all, if you care about efficiency, can't you avoid x87 in your calling convention? movd xmm0, eax
is a more efficient way to return a float
that was in an integer register. (And you can often avoid moving FP values to integer registers in the first place, using SSE2 integer instructions to pick apart exponent / mantissa for a log(x)
, or integer add 1 for nextafter(x)
.) But if you need to support very old hardware, then you need a 32-bit x87 version of your program as well as an efficient 64-bit version.
But there are other use-cases for small amounts of scratch space on the stack where it would be nice to save a couple instructions that offset ESP/RSP.
Trying to collect up the combined wisdom of other answers and discussion in comments under them (and on this answer):
It is explicitly documented as being not safe by Microsoft: (for 64-bit code, I didn't find an equivalent statement for 32-bit code but I'm sure there is one)
Stack Usage (for x64)
All memory beyond the current address of RSP is considered volatile: The OS, or a debugger, may overwrite this memory during a user debug session, or an interrupt handler.
So that's the documentation, but the interrupt reason stated doesn't make sense for the user-space stack, only the kernel stack. The important part is that they document it as not guaranteed safe, not the reasons given.
Hardware interrupts can't use the user stack; that would let user-space crash the kernel with mov esp, 0
, or worse take over the kernel by having another thread in the user-space process modify return addresses while an interrupt handler was running. This is why kernels always configure things so interrupt context is pushed onto the kernel stack.
Modern debuggers run in a separate process, and are not "intrusive". Back in 16-bit DOS days, without a multi-tasking protected-memory OS to give each task its own address space, debuggers would use the same stack as the program being debugged, between any two instructions while single-stepping.
@RossRidge points out that a debugger might want to let you call a function in the context of the current thread, e.g. with SetThreadContext
. This would run with ESP/RSP just below the current value. This could obviously have side-effects for the process being debugged (intentional on the part of the user running the debugger), but clobbering local variables of the current function below ESP/RSP would be an undesirable and unexpected side-effect. (So compilers can't put them there.)
(In a calling convention with a red-zone below ESP/RSP, a debugger could respect that red-zone by decrementing ESP/RSP before making the function call.)
There are existing program that intentionally break when being debugged at all, and consider this a feature (to defend against efforts to reverse-engineer them).
Related: the x86-64 System V ABI (Linux, OS X, all other non-Windows systems) does define a red-zone for user-space code (64-bit only): 128 bytes below RSP that is guaranteed not to be asynchronously clobbered. Unix signal handlers can run asynchronously between any two user-space instructions, but the kernel respects the red-zone by leaving a 128 byte gap below the old user-space RSP, in case it was in use. With no signal handlers installed, you have an effectively unlimited red-zone even in 32-bit mode (where the ABI does not guarantee a red-zone). Compiler-generated code, or library code, of course can't assume that nothing else in the whole program (or in a library the program called) has installed a signal handler.
So the question becomes: is there anything on Windows that can asynchronously run code using the user-space stack between two arbitrary instructions? (i.e. any equivalent to a Unix signal handler.)
As far as we can tell, SEH (Structured Exception Handling) is the only real obstacle to what you propose for user-space code on current 32 and 64-bit Windows. (But future Windows could include a new feature.) And I guess debugging if you happen ask your debugger to call a function in the target process/thread as mentioned above.
In this specific case, not touching any other memory other than the stack, or doing anything else that could fault, it's probably safe even from SEH.
SEH (Structured Exception Handling) lets user-space software have hardware exceptions like divide by zero delivered somewhat similarly to C++ exceptions. These are not truly asynchronous: they're for exceptions triggered by instructions you ran, not for events that happened to come after some random instruction.
But unlike normal exceptions, one thing a SEH handler can do is resume from where the exception occurred. (@RossRidge commented: SEH handlers are are initially called in the context of the unwound stack and can choose to ignore the exception and continue executing at the point where the exception occurred.)
So that's a problem even if there's no catch()
clause in the current function.
Normally HW exceptions can only be triggered synchronously. e.g. by a div
instruction, or by a memory access which could fault with STATUS_ACCESS_VIOLATION
(the Windows equivalent of a Linux SIGSEGV segmentation fault). You control what instructions you use, so you can avoid instructions that might fault.
If you limit your code to only accessing stack memory between the store and reload, and you respect the stack-growth guard page, your program won't fault from accessing [esp-4]
. (Unless you reached the max stack size (Stack Overflow), in which case push eax
would fault, too, and you can't really recover from this situation because there's no stack space for SEH to use.)
So we can rule out STATUS_ACCESS_VIOLATION
as a problem, because if we get that on accessing stack memory we're hosed anyway.
An SEH handler for STATUS_IN_PAGE_ERROR
could run before any load instruction. Windows can page out any page it wants to, and transparently page it back in if it's needed again (virtual memory paging). But if there's an I/O error, your Windows attempts to let your process handle the failure by delivering a STATUS_IN_PAGE_ERROR
Again, if that happens to the current stack, we're hosed.
But code-fetch could cause STATUS_IN_PAGE_ERROR
, and you could plausibly recover from that. But not by resuming execution at the place where the exception occurred (unless we can somehow remap that page to another copy in a highly fault-tolerant system??), so we might still be ok here.
An I/O error paging in the code that wants to read what we stored below ESP rules out any chance of reading it. If you weren't planning to do that anyway, you're fine. A generic SEH handler that doesn't know about this specific piece of code wouldn't be trying to do that anyway. I think usually a STATUS_IN_PAGE_ERROR
would at most try to print an error message or maybe log something, not try to carry on whatever computation was happening.
Accessing other memory in between the store and reload to memory below ESP could trigger a STATUS_IN_PAGE_ERROR
for that memory. In library code, you probably can't assume that some other pointer you passed isn't going to be weird and the caller is expecting to handle STATUS_ACCESS_VIOLATION
or PAGE_ERROR for it.
Current compilers don't take advantage of space below ESP/RSP on Windows, even though they do take advantage of the red-zone in x86-64 System V (in leaf functions that need to spill / reload something, exactly like what you're doing for int -> x87.) That's because MS says it isn't safe, and they don't know whether SEH handlers exist that could try to resume after an SEH.
Things that you'd think might be a problem in current Windows, and why they're not:
The guard page stuff below ESP: as long as you don't go too far below the current ESP, you'll be touching the guard page and trigger allocation of more stack space instead of faulting. This is fine as long as the kernel doesn't check user-space ESP and find out that you're touching stack space without having "reserved" it first.
kernel reclaim of pages below ESP/RSP: apparently Windows doesn't currently do this. So using a lot of stack space once ever will keep those pages allocated for the rest of your process lifetime, unless you manually VirtualAlloc(MEM_RESET)
them. (The kernel would be allowed to do this, though, because the docs say memory below RSP is volatile. The kernel could effectively zero it asynchronously if it wants to, copy-on-write mapping it to a zero page instead of writing it to the pagefile under memory pressure.)
APC (Asynchronous Procedure Calls): They can only be delivered when the process is in an "alertable state", which means only when inside a call
to a function like SleepEx(0,1)
. call
ing a function already uses an unknown amount of space below E/RSP, so you already have to assume that every call
clobbers everything below the stack pointer. Thus these "async" callbacks are not truly asynchronous with respect to normal execution the way Unix signal handlers are. (fun fact: POSIX async io does use signal handlers to run callbacks).
Console-application callbacks for ctrl-C and other events (SetConsoleCtrlHandler
). This looks exactly like registering a Unix signal handler, but in Windows the handler runs in a separate thread with its own stack. (See RbMm's comment)
SetThreadContext
: another thread could change our EIP/RIP asynchronously while this thread is suspended, but the whole program has to be written specially for that to make any sense. Unless it's a debugger using it. Correctness is normally not required when some other thread is messing around with your EIP unless the circumstances are very controlled.
And apparently there are no other ways that another process (or something this thread registered) can trigger execution of anything asynchronously with respect to the execution of user-space code on Windows.
If there are no SEH handlers that could try to resume, Windows more or less has a 4096 byte red-zone below ESP (or maybe more if you touch it incrementally?), but RbMm says nobody takes advantage of it in practice. This is unsurprising because MS says not to, and you can't always know if your callers might have done something with SEH.
Obviously anything that would synchronously clobber it (like a call
) must also be avoided, again same as when using the red-zone in the x86-64 System V calling convention. (See https://stackoverflow.com/tags/red-zone/info for more about it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With