Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens if a signal handler is invoked while at a cancellation point?

Suppose an application is blocked at a cancellation point, for example read, and a signal is received and a signal handler invoked. Glibc/NPTL implements cancellation points by enabling asynchronous cancellation for the duration of the syscall, so as far as I can tell, asynchronous cancellation will remain in effect for the entire duration of the signal handler. This would of course be horribly wrong, as there are plenty of functions that are not async-cancel-safe but which are required to be safe to call from signal handlers.

This leaves me with two questions:

  • Am I wrong or is the glibc/NPTL behavior really this dangerously broken? If so, is such dangerous behavior conformant?
  • What, according to POSIX, is supposed to happen if a signal handler is invoked while the process is executing a function which is a cancellation point?

Edit: I've almost convinced myself that any thread which is a potential target of pthread_cancel must ensure that functions which are cancellation points can never be called from a signal handler in that thread's context:

On the one hand, any signal handler that can be invoked in a thread that might be cancelled and which uses any async-cancel-unsafe functions must disable cancellation before calling any function which is a cancellation point. This is because, from the perspective of the code interrupted by the signal, any such cancellation would be equivalent to asynchronous cancellation. On the other hand, a signal handler cannot disable cancellation, unless the code that will be running when the signal handler is invoked only uses async-signal-safe functions, because pthread_setcancelstate is not async-signal-safe.

like image 637
R.. GitHub STOP HELPING ICE Avatar asked Mar 23 '11 16:03

R.. GitHub STOP HELPING ICE


2 Answers

To answer the first half of my own question: glibc does exhibit the behavior I predicted. Signal handlers that run while blocked at a cancellation point run under asynchronous cancellation. To see this effect, simply create a thread that invokes a cancellation point that will block forever (or for a long time), wait a moment, send it a signal, wait a moment again, and cancel and join it. The signal handler should fiddle with some volatile variables in a way that makes it clear that it ran for an unpredictable amount of time before being terminated asynchronously.

As for whether POSIX allows this behavior, I'm still not 100% certain. POSIX states:

Whenever a thread has cancelability enabled and a cancellation request has been made with that thread as the target, and the thread then calls any function that is a cancellation point (such as pthread_testcancel() or read()), the cancellation request shall be acted upon before the function returns. If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if:

  • The thread is suspended at a cancellation point and the event for which it is waiting occurs

  • A specified timeout expired

before the cancellation request is acted upon.

Presumably executing a signal handler is not a case of being "suspended", so I'm leaning towards interpreting glibc's behavior here as non-conformant.

like image 57
R.. GitHub STOP HELPING ICE Avatar answered Sep 24 '22 08:09

R.. GitHub STOP HELPING ICE


Rich,

I came across this question while doing the AC-safe documentation review that Alex Oliva was working on for glibc.

It is my opinion that the GNU C Library implementation (nptl-based) is not broken. While it is true that asynchronous cancellation is enabled around blocking syscalls (which are required to be cancellation points) such behaviour should still be conformant.

It is also true that a signal taken after asynchronous cancellation is enabled will result in a signal handler running with asynchronous cancellation enabled. It is also true that doing anything in that handler that is not also asynchronous cancellation safe is dangerous.

It is also true that if another thread calls pthread_cancel with the signal running thread as the target, that such cancellation will be acted upon immediately. This is still in line with the POSIX wording of "before the function returns" (in this case read had not returned and the target thread is in the signal handler).

The problem with the signal is that it causes the thread to be in two simultaneous states, both perpetually in a cancellation point, and executing instructions. If the cancellation request arrives it is my opinion that it is conformant for it to be acted upon immediately. Though the Austin Group might clarify.

The problem with the glibc implementation is that it requires all signal handlers, executed by the to-be cancelled thread. to only call asynchronous-cancel-safe functions. This is a non-obvious requirement that doesn't stem from the standard, but doesn't render it non-conformant.

On potential solution to solve the fragility of signal handlers:

  • Do not enable async-cancellation for blocking syscalls, instead enable a new IN_SYSCALL bit in the cancellation implementation.

  • When pthread_cancel is called and the target thread has IN_SYSCALL set then send SIGCANCEL to the thread as normally would be done for async-cancel, but the SIGCANCEL handler does nothing (other than the side effect of interrupting the syscall).

  • The wrapper around the syscalls will look for cancellation to have been sent and cancel the thread before the wrapper returns.

While posting this on stack overflow was fun, I don't know anyone else that reads this and can answer your question in the detail required.

I think any further discussion should happen on the Austin Group mailing list as part of a POSIX standards discussion, or should happen on libc-alpha as phart of a glibc implementation discussion.

like image 32
Carlos O'Donell Avatar answered Sep 25 '22 08:09

Carlos O'Donell