Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does sem_wait not unblock (and return -1) on an interrupt?

I have a programme using sem_wait. The Posix specification says:

The sem_wait() function is interruptible by the delivery of a signal.

Additionally, in the section about errors it says:

[EINTR] - A signal interrupted this function.

However, in my programme, sending a signal does not unblock the call (and return -1 as indicated in the spec).

A minimal example can be found below. This programme hangs and sem_wait never unblocks after the signal is sent.

#include <semaphore.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

sem_t sem;

void sighandler(int sig) {
  printf("Inside sighandler\n");
}

void *thread_listen(void *arg) {
  signal(SIGUSR1, &sighandler);
  printf("sem_wait = %d\n", sem_wait(&sem));
  return NULL;
}

int main(void) {

  pthread_t thread;

  sem_init(&sem, 0, 0); 

  pthread_create(&thread, NULL, &thread_listen, NULL);

  sleep(1);
  raise(SIGUSR1);

  pthread_join(thread, NULL);

  return 0;
}

The programme outputs Inside sighandler then hangs.

There is another question here about this, but it doesn't really provide any clarity.

Am I misunderstanding what the spec says? FYI my computer uses Ubuntu GLIBC 2.31-0ubuntu9.

like image 721
alphabetical98332 Avatar asked Apr 26 '20 22:04

alphabetical98332


People also ask

What does Sem_wait return?

sem_wait() returns zero after completing successfully. Any other return value indicates that an error occurred.

Is Sem_wait blocking?

Use sem_wait(3RT) to block the calling thread until the semaphore count pointed to by sem becomes greater than zero, then atomically decrement the count.

What does sem_ wait mean?

The sem_wait() function decrements by one the value of the semaphore. The semaphore will be decremented when its value is greater than zero. If the value of the semaphore is zero, then the current thread will block until the semaphore's value becomes greater than zero.

Is Sem_wait thread safe?

You can write thread safe code using primitives to protect global data with critical sections. Signal handlers can't rely on this. For example, you could be inside a critical section within sem_wait, and simultaneously do something that causes a segfault. This would break the thread-safe protections of sem_wait.


2 Answers

There are three reasons why this program doesn't behave as you expect, only two of which are fixable.

  1. As pointed out in David Schwartz’s answer, in a multi-threaded program, raise sends a signal to the thread that calls raise.

    To get the signal sent to the thread you wanted, in this test program, change the raise(SIGUSR1) to pthread_kill(thread, SIGUSR1). However, if you want that specific thread to handle SIGUSR1 when it’s sent to the entire process, what you need to do is use pthread_sigmask to block SIGUSR1 in all of the threads except the one that's supposed to handle it. (See below for more detail on this.)

  2. On systems that use glibc, signal installs a signal handler that does not interrupt blocking system calls. To get a signal handler that does, you need to use sigaction and set sa_flags to a value that doesn’t include SA_RESTART. For instance,

      struct sigaction sa;
      sigemptyset(&sa.sa_mask);
      sa.sa_handler = sighandler;
      sa.sa_flags = 0;
      sigaction(SIGUSR1, &sa, 0);
    

    Note: memset(&sa, 0, sizeof sa) is not guaranteed to have the same effect as sigemptyset(&sa.sa_mask).

    Note: Signal handlers are process-global, so it doesn’t matter which thread you call sigaction on. In almost all cases, multithreaded programs should do all their sigaction calls in main before creating any threads, just to make sure the signal handlers are active before any signals can happen.

  3. The signal could be delivered to the thread before the thread has a chance to call sem_wait. If that happens, the signal handler will be called and return, and then sem_wait will be called and it will block forever. In this test program, you can make this arbitrarily unlikely by increasing the length of the sleep in main, but there is no way to make it impossible. This is the unfixable reason.

    There are a small number of system calls that atomically unblock signals while sleeping, and then block them again before returning to user space, such as sigsuspend, sigwaitinfo, and pselect. These are the only system calls for which this race condition can be avoided.

    Best practice for a multi-threaded program that has to deal with signals is to have one thread devoted to signal handling. To make that work reliably, you should block all signals except for synchronous CPU exceptions (SIGABRT, SIGBUS, SIGFPE, SIGILL, SIGSEGV, SIGSYS, and SIGTRAP) at the very beginning of main, before creating any threads. Then you set a do-nothing signal handler (with SA_RESTART) for the signals you want to handle; these will never actually be called, their purpose is to prevent the kernel from killing the process due to the default action of SIGUSR1 or whatever. The set of signals you care about must include all of the signals for user interrupts: SIGHUP, SIGINT, SIGPWR, SIGQUIT, SIGTERM, SIGTSTP, SIGXCPU, SIGXFSZ. Finally, you create the signal-handling thread, which loops calling sigwaitinfo for the appropriate set of signals, and dispatches messages to the rest of the threads using pipes or condition variables or anything but signals really. This thread must never block in any system call other than sigwaitinfo.

    In the case of this test program, the signal-handling thread would respond to SIGUSR1 by calling sem_post(&sem). This would either wake up the listener thread, or it would cause the listener thread not to become blocked on sem_wait in the first place.

like image 135
zwol Avatar answered Nov 15 '22 19:11

zwol


In a multi-threaded program, raise sends a signal to the thread that calls raise. You need to use kill(getpid(), ...) or pthread_signal(thread, ...).

like image 26
David Schwartz Avatar answered Nov 15 '22 19:11

David Schwartz