I have this following code which goes into infinite recursion and triggers a seg fault when it exhausts the stack limit allocated to it. I am trying to capture this segmentation fault and exit gracefully. However, I was not able to catch this segmentation fault in any of the signal numbers.
(A customer is facing this issue and wants a solution for such a use-case. Increasing the stack size by something like "limit stacksize 128M" makes his test pass. However, he is asking for a graceful exit rather than a seg fault. The following code simply reproduces the actual issue not what the actual algorithm does).
Any help is appreciated. If something is incorrect in the way I am trying to catch the signal please let me know that too. To compile: g++ test.cc -std=c++0x
#include <iostream>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
int recurse_and_crash (int val)
{
// Print rough call stack depth at intervals.
if ((val %1000) == 0)
{
std::cout << "\nval: " << val;
}
return val + recurse_and_crash (val+1);
}
void signal_handler(int signal, siginfo_t * si, void * arg)
{
std::cout << "Caught segfault\n";
exit(0);
}
int main(int argc, char ** argv)
{
int signal = 11; // SIGSEGV
if (argc == 2)
{
signal = std::stoi(std::string(argv[1]));
}
struct sigaction sa;
memset(&sa, 0, sizeof(struct sigaction));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = signal_handler;
sa.sa_flags = SA_SIGINFO;
sigaction(signal, &sa, NULL);
recurse_and_crash (1);
}
This is a surprisingly complex problem to solve. I will at this point not give working code, but rather focus on a few "nifty" issues that you have - or, as you continue coding for this - will encounter.
First, why are you recursing ?
The reason for that is that while signal handlers are "execution context transfers", by default they do not have their own stack. That means if you receive a signal as a consequence of an overflown stack, the signal handler will attempt to allocate space-on-the-stack for context potentially passed to it - and that simply re-throws the same signal again.
To make sure signal handlers run on their own separate / preallocated stack, use sigaltstack() and the SA_ONSTACK flag for sigaction().
Second, depending on "how badly" the stack overruns (your test program may not trigger this but a real world program may), the memory access (attempt) that's "the overflow-effecting action" may end up with other signals but SIGSEGV.
Your example "unspecifically" catches all signals, but that may in practice be rather insufficient / rather confusing - you sending your app a SIGUSR1 or the shell/terminal sending it a SIGTTOU on being backgrounded are absolutely not indicative of a stackoverflow.
This means there's another issue - which signals are to be expected when making an "out of stack" memory access as consequence of a stack overflow ? And how can you know that a specific signal you got was due to a stack access ?
The answer to that again is more complex than first sight:
SIGSEGV.SIGBUS instead.SIGSEGV or SIGBUS (For example, on x86, certain instructions raise #GP while others #PF - for the same mem address read/write - and the Linux kernel translates one possibly to SIGBUS the other to SIGSEGV)char local_to_blow_stack[1ULL << 40]; memset(&local_to_blow_stack, 0, 1);) and just-so-as-it-happens something else valid is at "whatever your stack is minus a terabyte"), that access will in fact just-work. Without the compiler to create you "assist" code to identify such accesses, it's actually possible you've blown the stack and still make a number of successful / non-signaling memory accesses before eventually hitting a mem region triggering a signal.So "just catching signals", even "catching all signals that may possibly occur as a consequence of a stack overflow" is insufficient. You need, within the signal handler to decode the memory access location, and possibly the operation / cpu instruction, to verify that the memory access attempted actually was a "stack access out of bounds". It is possible for a thread to retrieve its own stack boundaries - https://man7.org/linux/man-pages/man3/pthread_getattr_np.3.html can be used for this, at least on Linux (_np implies 'non portable' - this isn't guaranteed to be available on all systems, others may have different interfaces to retrieve this information) - but ... to find the memory location that was accessed depends on the signal and accessing instruction again. Often (but not always) it's in the siginfo (the si_addr) field.
From what I remember, exactly which signals fill si_addr under exactly what circumstances, and whether the address in there is e.g. the instruction issuing the memory access or the memory location of the attempted access, is somewhat system- and hardware-dependent (Linux may behave differently from Windows or MacOSX, and different on ARM than on x86)
So you would also need to validate that "the si_addr in this siginfo_t is somewhere-near the signaled thread's stack", but possibly also validate that the instruction that caused it was actually a memory access / si_addr can be "traced back" to the instruction that faulted. That (finding the faulting instruction's address / the program counter) ... requires decoding the other argument for the signal handler, the ucontext_t ... and there you're deep deep deep [ recurse infinity here ] in HW / OS specifics.
At this point I'd like to terminate; a "simple" but not perfect solution just needs an alternate signal stack, and the handler to retrieve the current stack boundaries via pthread_getattr_np(), to compare the si_addr against. If your life or that of others depends on the correct answer, remember the above though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With