Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sem_post, signal handlers, and undefined behavior

Does this use of sem_post() in a signal handler rely on undefined behavior?

/* 
 * excerpted from the 2017-09-15 Linux man page for sem_wait(3)
 * http://man7.org/linux/man-pages/man3/sem_wait.3.html
 */
...
sem_t sem;
...
static void
handler(int sig)
{
    write(STDOUT_FILENO, "sem_post() from handler\n", 24);
    if (sem_post(&sem) == -1) {
        write(STDERR_FILENO, "sem_post() failed\n", 18);
        _exit(EXIT_FAILURE);
    }
}

The semaphore sem has static storage duration. While the call to sem_post() is async-signal-safe, the POSIX.1-2008 treatment of signal actions seems to disallow referencing that semaphore itself:

[T]he behavior is undefined if the signal handler refers to any object other than errno with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t, or if the signal handler calls any function defined in this standard other than one of the [explicitly async-signal-safe functions].

like image 488
pilcrow Avatar asked Feb 02 '18 14:02

pilcrow


1 Answers

Technically, yes; there are situations where the behaviour is undefined.

I myself use this pattern quite a lot, and so does almost all signal-aware programs I've looked at. It is expected to work in practice, and be portable across systems, even if not dictated by any standard.

The POSIX.1 standard defines it as Undefined Behaviour, not because it expects programs to avoid such access, but because defining the safe access situations would be too complicated and possibly limit future implementations, for very little to no gain, as there is a well-known workaround for all such accesses: a dedicated thread catching the signals.


Added on 2018-06-21:

Let's first summarize the cases where the sem_post(&sem) access is valid in a signal handler (i.e., one can refer to objects with static storage duration, for example via any async signal safe functions), based on POSIX.1-2018:

  • When the process has only one thread, the signal handler is executed as a result of a thread in that same process calling abort(), raise(), kill(), pthread_kill(), or sigqueue(), and the signal is/was not blocked in the thread that was used to execute the handler.

  • When the process has only one thread, the signal was blocked when it became pending, and it was delivered before the call that unblocked the signal returns.

This leaves out the most common cases: multithreaded processes, and also handlers for signals generated externally to the process (for example, SIGINT when the process runs in the foreground, and the user presses Ctrl+C; or SIGHUP when the session the process is running in is closed).

My understanding of the situation is that everybody expects that signal handlers that refer to objects with static storage duration via async-signal safe functions, will not trigger undefined behaviour on any sane POSIXy architectures; if one uses the multithread-safe (MT-safe) async-signal safe functions on objects with static storage duration, it will work exactly the same in a multithreaded process as it would in a single-threaded process; that signals triggered by alarm(), setitimer(), and timer_settime() behave the same as those triggered by raise() or sigqueue(); and that signals sent by other processes behave the same as those triggered by raise() or sigqueue() in the target process; with the only difference being some fields in the siginfo structure having different values.

There is even a small possibility that the wording should have accesses instead of refers to. That indeed would indeed allow passing the address of any object with static storage duration to async-signal safe functions like sem_post() even in multithreaded processes, like Carlo Wood's answer posits.

However, I believe that the reason for this wording is more subtle, and involves differences in hardware implementations regarding concurrent accesses and the contexts signal handlers are executed in: the behaviour in cases where some POSIX OSes might behave differently was too complicated to be standardized, so was simply left Undefined instead.

The rest of my answer attempts to describe those, for developers who do wish to produce reliable, robust programs that work on all POSIXy systems, and do not understand the subtleness of the current wording in the POSIX.1 spec.


The issue of exactly what objects a signal handler can access safely is complex. Rather than open up the whole can of worms, the POSIX standard drafters just punted it, and declared the behaviour undefined.

The hardest part to define would be the details related to concurrent access and trap representations. Not just by other threads in the same process, but also by the kernel. (Because we are considering only objects with static storage duration, we can avoid shared memory and all associated complexity there.) In particular, if an object has trap representations, and the object is modified non-atomically, it is possible that the intermediate stages of assignment cause a trap. And that trap itself may cause a signal to be raised, although there may be hardware limitations on some architectures.

So, anything related to trap representations is basically too complicated to resolve in the standard.

Okay, let's assume the standard would limit safe read access to objects with static storage duration, that are not being concurrently modified by the interrupted thread, any other thread in the process, nor the kernel; and write access to objects with static storage duration that are not being concurrently read or modified by the interrupted thread, any other thread in the process, nor the kernel. And that the object being accessed has no trap representations at all.

We still have a few hardware-specific signals to consider: SIGSEGV, SIGBUS, SIGILL, and SIGFPE at least. Unfortunately, some architectures may have additional signals not known at this time, so we'd need to define the type of signal affected: signals that are raised by the kernel when memory is accessed (SIGFPE only if the architecture raises it when loading the value, and not just when doing arithmetic etc. on such values). If the access to an object with static storage duration may raise one of these signals, then the access is not safe, as it can lead to a cascade of signal handlers. (Because standard POSIX signals are not queued, only the first signal of each kind gets to execute, and the process state can be lost, forcing the kernel to kill the process.)

From the POSIX C compiler point of view, the entire situation gets much more complicated if you consider a signal handler that obtains a pointer as payload (si_value.sival_ptr in the siginfo_t): does the access lead to Undefined Behaviour, depending on whether the target has static storage duration or not?

On all current POSIXy systems, accessing static storage duration objects through atomic built-ins, or when they are not being read/modified by any other threads or the kernel and the intermediate storage forms do not cause a signal to be raised, in a POSIX realtime signal handler, or in a POSIX signal handler that is not raised by memory access, is safe. This is likely, but not guaranteed, to be true in the future, too. And that is at the core of why the POSIX standard does not standardize it.

The cold fact is, there is a POSIX-compliant workaround for all the patterns requiring access to an object with static storage duration: a separate thread, dedicated to handling signals via sigwaitinfo(), with all those signals blocked in all other threads. That thread is not limited to using async-signal safe functions, nor do the other signal handler limitations apply to it. (If we consider the interaction between signal delivery and the code it interrupts, even with handlers defined with the SA_RESTART flag, one could argue that the thread-based approach is the better one of the two.)

Simply put: Because known workarounds exist, and defining the safe access cases would be too complicated and limit future implementations, the POSIX standard does not standardize this traditional use case at all. It is not because it is expected to not work -- quite the opposite; it works fine in all current POSIXy systems --, but because it is not worth the complexity and possible limitations to define the safe access cases (other than errno and volatile sig_atomic_t, which both require and have support from the POSIX C compilers).

like image 73
Nominal Animal Avatar answered Oct 01 '22 22:10

Nominal Animal