Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a segmentation fault not recoverable?

Following a previous question of mine, most comments say "just don't, you are in a limbo state, you have to kill everything and start over". There is also a "safeish" workaround.

What I fail to understand is why a segmentation fault is inherently nonrecoverable.

The moment in which writing to protected memory is caught - otherwise, the SIGSEGV would not be sent.

If the moment of writing to protected memory can be caught, I don't see why - in theory - it can't be reverted, at some low level, and have the SIGSEGV converted to a standard software exception.

Please explain why after a segmentation fault the program is in an undetermined state, as very obviously, the fault is thrown before memory was actually changed (I am probably wrong and don't see why). Had it been thrown after, one could create a program that changes protected memory, one byte at a time, getting segmentation faults, and eventually reprogramming the kernel - a security risk that is not present, as we can see the world still stands.

  1. When exactly does a segmentation fault happen (= when is SIGSEGV sent)?
  2. Why is the process in an undefined behavior state after that point?
  3. Why is it not recoverable?
  4. Why does this solution avoid that unrecoverable state? Does it even?
like image 844
Gulzar Avatar asked Dec 07 '21 10:12

Gulzar


People also ask

How do you overcome a segmentation fault?

It can be resolved by having a base condition to return from the recursive function. A pointer must point to valid memory before accessing it.

Can you recover from a segfault?

On both Windows and Linux, the segfault handler function is passed a "context struct", which includes the state of the registers at the failure site. Ostensibly, this is so people can repair the problem that caused the segfault (it also lets you do nifty things like userspace segment handling).

What happens when a segmentation fault occurs?

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

Does segmentation fault mean memory leak?

Most memory errors which aren't memory leaks end up resulting in a segmentation fault. A segmentation fault is raised when the operating system realizes that your program is trying to access memory that it shouldn't have access to.


2 Answers

When exactly does segmentation fault happen (=when is SIGSEGV sent)?

When you attempt to access memory you don’t have access to, such as accessing an array out of bounds or dereferencing an invalid pointer. The signal SIGSEGV is standardized but different OS might implement it differently. "Segmentation fault" is mainly a term used in *nix systems, Windows calls it "access violation".

Why is the process in undefined behavior state after that point?

Because one or several of the variables in the program didn’t behave as expected. Let’s say you have some array that is supposed to store a number of values, but you didn’t allocate enough room for all them. So only those you allocated room for get written correctly, and the rest written out of bounds of the array can hold any values. How exactly is the OS to know how critical those out of bounds values are for your application to function? It knows nothing of their purpose.

Furthermore, writing outside allowed memory can often corrupt other unrelated variables, which is obviously dangerous and can cause any random behavior. Such bugs are often hard to track down. Stack overflows for example are such segmentation faults prone to overwrite adjacent variables, unless the error was caught by protection mechanisms.

If we look at the behavior of "bare metal" microcontroller systems without any OS and no virtual memory features, just raw physical memory - they will just silently do exactly as told - for example, overwriting unrelated variables and keep on going. Which in turn could cause disastrous behavior in case the application is mission-critical.

Why is it not recoverable?

Because the OS doesn’t know what your program is supposed to be doing.

Though in the "bare metal" scenario above, the system might be smart enough to place itself in a safe mode and keep going. Critical applications such as automotive and med-tech aren’t allowed to just stop or reset, as that in itself might be dangerous. They will rather try to "limp home" with limited functionality.

Why does this solution avoid that unrecoverable state? Does it even?

That solution is just ignoring the error and keeps on going. It doesn’t fix the problem that caused it. It’s a very dirty patch and setjmp/longjmp in general are very dangerous functions that should be avoided for any purpose.

We have to realize that a segmentation fault is a symptom of a bug, not the cause.

like image 118
Lundin Avatar answered Sep 28 '22 16:09

Lundin


Please explain why after a segmentation fault the program is in an undetermined state

I think this is your fundamental misunderstanding -- the SEGV does not cause the undetermined state, it is a symptom of it. So the problem is (generally) that the program is in an illegal, unrecoverable state WELL BEFORE the SIGSEGV occurs, and recovering from the SIGSEGV won't change that.

  • When exactly does segmentation fault happen (=when is SIGSEGV sent)?

The only standard way in which a SIGSEGV occurs is with the call raise(SIGSEGV);. If this is the source of a SIGSEGV, then it is obviously recoverable by using longjump. But this is a trivial case that never happens in reality. There are platform-specific ways of doing things that might result in well-defined SEGVs (eg, using mprotect on a POSIX system), and these SEGVs might be recoverable (but will likely require platform specific recovery). However, the danger of undefined-behavior related SEGV generally means that the signal handler will very carefully check the (platform dependent) information that comes along with the signal to make sure it is something that is expected.

  • Why is the process in undefined behavior state after that point?

It was (generally) in undefined behavior state before that point; it just wasn't noticed. That's the big problem with Undefined Behavior in both C and C++ -- there's no specific behavior associated with it, so it might not be noticed right away.

  • Why does this solution avoid that unrecoverable state? Does it even?

It does not, it just goes back to some earlier point, but doesn't do anything to undo or even identify the undefined behavior that cause the problem.

like image 21
Chris Dodd Avatar answered Sep 28 '22 16:09

Chris Dodd