I'm working on implementing pthread cancellation on Linux without any of the "unpleasant behavior" (some might say bugs) discussed in some of my other recent questions. The Linux/glibc approach to pthread cancellation so far has been to treat it as something that doesn't need kernel support, and that can be handled at the library level purely by enabling asynchronous cancellation before making a syscall, and restoring the previous cancellation state after the syscall returns. This has at least 2 problems, one of them extremely serious:
My first idea for fixing the problem was to set a flag that the thread is at a cancellation point, rather than enabling async cancellation, and when this flag is set, have the cancellation signal handler check the saved instruction pointer to see if it points to a syscall instruction (arch-specific). If so, this indicates the syscall was not completed and would be restarted when the signal handler returns, so we can cancel. If not, I assumed the syscall had already returned, and deferred cancellation. However, there is also a race condition - it's possible that the thread had not yet reached the syscall instruction at all, in which case, the syscall could block and never respond to the cancellation. Another small problem is that non-cancellable syscalls performed from a signal handler wrongly became cancellable, if the cancellation point flag was set when the signal handler was entered.
I'm looking at a new approach, and looking for feedback on it. The conditions that must be met:
The idea I have in mind requires specialized assembly for the cancellable syscall wrapper. The basic idea would be:
The cancel operation would then involve:
The cancellation signal handler would then:
The only problem I see so far is in step 1 of the signal handler: if it decides not to act, then after the signal handler returns, the thread could be left blocking on the syscall, ignoring the pending cancellation request. For this, I see two potential solutions:
Any thoughts on which approach is best, or if there are other more fundamental flaws I'm missing?
Solution 2 feels like less of a hack. I don't think it would cause the problem you suggest, because cancellable syscalls called within the syscall handler will check the cancellation flag in TLS, which must have already been set if the cancellation signal handler has run and monkeyed with the signal mask anyway.
(It seems like it would be much easier for implementers if every blocking syscall took a sigmask
parameter a la pselect()
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With