Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cancelling thread that is stuck on epoll_wait

I'm doing some event handling with C++ and pthreads. I have a main thread that reads from event queue I defined, and a worker thread that fills the event queue. The queue is of course thread safe.

The worker thread have a list of file descriptors and create an epoll system call to get events on those file descriptors. It uses epoll_wait to wait for events on the fd's.

Now the problem. Assuming I want to terminate my application cleanly, how can I cancel the worker thread properly? epoll_wait is not one of the cancellation points of pthread(7) so it cannot react properly on pthread_cancel.

The worker thread main() looks like this

while(m_WorkerRunning) {
    epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
    //handle events and insert to queue
}

The m_WorkerRunning is set to true when the thread starts and it looks like I can interrupt the thread by settings m_WorkerRunning to false from the main thread. The problem is that epoll_wait theoretically can wait forever.

Other solution I though about is: instead of waiting forever (-1) I can wait for example X time slots, then handle properly no-events case and if m_WorkerRunning == false then exit the loop and terminate the worker thread cleanly. The main thread then sets m_WorkerRunning to false, and sleeps X. However I'm not sure about the performance of such epoll_wait and also not sure what would be the correct X? 500ms? 1s? 10s?

I'd like to hear some experienced advises!

More relevant information: the fd's I'm waiting events on, are devices in /dev/input so technically I'm doing some sort of input subsystem. The targeted OS is Linux (latest kernel) on ARM architecture.

Thanks!

like image 266
Dmitry Kudryavtsev Avatar asked Sep 04 '13 12:09

Dmitry Kudryavtsev


2 Answers

alk's answer above is almost correct. The difference, however, is very dangerous.

If you are going to send a signal in order to wake up epoll_wait, never use epoll_wait. You must use epoll_pwait, or you might run into a race with your epoll never waking up.

Signals arrive asynchronously. If your SIGUSR1 arrives after you've checked your shutdown procedure, but before your loop returns to the epoll_wait, then the signal will not interrupt the wait (as there is none), but neither will the program exit.

This might be very likely or extremely unlikely, depending on how long the loop takes in relation to how much time is spent in the wait, but it is a bug one way or the other.

Another problem with alk's answer is that it does not check why the wait was interrupted. It might be any number of reasons, some unrelated to your exit.

For more information, see the man page for pselect. epoll_pwait works in a similar way.

Also, never send signals to threads using kill. Use pthread_kill instead. kill's behavior when sending signals is, at best, undefined. There is no guarantee that the correct thread will receive it, which might cause an unrelated system call to be interrupted, or nothing at all to happen.

like image 104
Shachar Shemesh Avatar answered Oct 18 '22 20:10

Shachar Shemesh


You could send the thread a signal which would interupt the blocking call to epoll_wait(). If doing so modify your code like this:

while(m_WorkerRunning) 
{
  int result = epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
  if (-1 == result)
  {
    if (EINTR == errno)
    {
      /* Handle shutdown request here. */ 
      break;
    }
    else
    {
      /* Error handling goes here. */
    }
  }

  /* Handle events and insert to queue. */
}

A way to add a signal handler:

#include <signal.h>

/* A generic signal handler doing nothing */
void signal_handler(int sig)
{
  sig = sig; /* Cheat compiler to not give a warning about an unused variable. */
}

/* Wrapper to set a signal handler */
int signal_handler_set(int sig, void (*sa_handler)(int))
{
  struct sigaction sa = {0};
  sa.sa_handler = sa_handler;
  return sigaction(sig, &sa, NULL);
}

To set this handler for the signal SIGUSR1 do:

if (-1 == signal_handler_set(SIGUSR1, signal_handler))
{
  perror("signal_handler_set() failed");
}

To send a signal SIGUSR1 from another process:

if (-1 == kill(<target process' pid>, SIGUSR1))
{
  perror("kill() failed");
}

To have a process send a signal to itself:

if (-1 == raise(SIGUSR1))
{
  perror("raise() failed");
}
like image 22
alk Avatar answered Oct 18 '22 22:10

alk