Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use an eventfd with level triggered behaviour on epoll?

Registering a level triggered eventfd on epoll_ctl only fires once, when not decrementing the eventfd counter. To summarize the problem, I have observed that the epoll flags (EPOLLET, EPOLLONESHOT or None for level triggered behaviour) behave similar. Or in other words: Does not have an effect.

Could you confirm this bug?

I have an application with multiple threads. Each thread waits for new events with epoll_wait with the same epollfd. If you want to terminate the application gracefully, all threads have to be woken up. My thought was that you use the eventfd counter (EFD_SEMAPHORE|EFD_NONBLOCK) for this (with level triggered epoll behavior) to wake up all together. (Regardless of the thundering herd problem for a small number of filedescriptors.)

E.g. for 4 threads you write 4 to the eventfd. I was expecting epoll_wait returns immediately and again and again until the counter is decremented (read) 4 times. epoll_wait only returns once for every write.

Yep, I read all related manuals carefully ;)

#include <sys/epoll.h>
#include <sys/eventfd.h>
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>

static int event_fd = -1;
static int epoll_fd = -1;

void *thread(void *arg)
{
    (void) arg;

    for(;;) {
       struct epoll_event event;
       epoll_wait(epoll_fd, &event, 1, -1);

       /* handle events */
       if(event.data.fd == event_fd && event.events & EPOLLIN) {
           uint64_t val = 0;
           eventfd_read(event_fd, &val);
           break;
       }
    }

    return NULL;
}

int main(void)
{
    epoll_fd = epoll_create1(0);
    event_fd = eventfd(0, EFD_SEMAPHORE| EFD_NONBLOCK);

    struct epoll_event event;
    event.events = EPOLLIN;
    event.data.fd = event_fd;
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, event_fd, &event);

    enum { THREADS = 4 };
    pthread_t thrd[THREADS];

    for (int i = 0; i < THREADS; i++)
        pthread_create(&thrd[i], NULL, &thread, NULL);

    /* let threads park internally (kernel does readiness check before sleeping) */
    usleep(100000);
    eventfd_write(event_fd, THREADS);

    for (int i = 0; i < THREADS; i++)
        pthread_join(thrd[i], NULL);
}
like image 543
linkjumper Avatar asked Jun 06 '20 12:06

linkjumper


People also ask

What is edge triggered epoll?

epoll provides both edge-triggered and level-triggered modes. In edge-triggered mode, a call to epoll_wait will return only when a new event is enqueued with the epoll object, while in level-triggered mode, epoll_wait will return as long as the condition holds.

How use Eventfd Linux?

As its return value, eventfd() returns a new file descriptor that can be used to refer to the eventfd object. The following values may be bitwise ORed in flags to change the behavior of eventfd(): EFD_CLOEXEC (since Linux 2.6. 27) Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.

Why epoll is faster than select?

The main difference between epoll and select is that in select() the list of file descriptors to wait on only exists for the duration of a single select() call, and the calling task only stays on the sockets' wait queues for the duration of a single call.


1 Answers

When you write to an eventfd, a function eventfd_signal is called. It contains the following line which does the wake up:

wake_up_locked_poll(&ctx->wqh, EPOLLIN);

With wake_up_locked_poll being a macro:

#define wake_up_locked_poll(x, m)                       \
    __wake_up_locked_key((x), TASK_NORMAL, poll_to_key(m))

With __wake_up_locked_key being defined as:

void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key)
{
    __wake_up_common(wq_head, mode, 1, 0, key, NULL);
}

And finally, __wake_up_common is being declared as:

/*
 * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
 * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
 * number) then we wake all the non-exclusive tasks and one exclusive task.
 *
 * There are circumstances in which we can try to wake a task which has already
 * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
 * zero in this (rare) case, and we handle it by continuing to scan the queue.
 */
static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
            int nr_exclusive, int wake_flags, void *key,
            wait_queue_entry_t *bookmark)

Note the nr_exclusive argument and you will see that writing to an eventfd wakes only one exclusive waiter.

What does exclusive mean? Reading epoll_ctl man page gives us some insight:

EPOLLEXCLUSIVE (since Linux 4.5):

Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd. When a wakeup event occurs and multiple epoll file descriptors are attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2).

You do not use EPOLLEXCLUSIVE when adding your event, but to wait with epoll_wait every thread has to put itself to a wait queue. Function do_epoll_wait performs the wait by calling ep_poll. By following the code you can see that it adds the current thread to a wait queue at line #1903:

__add_wait_queue_exclusive(&ep->wq, &wait);

Which is the explanation for what is going on - epoll waiters are exclusive, so only a single thread is woken up. This behavior has been introduced in v2.6.22-rc1 and the relevant change has been discussed here.

To me this looks like a bug in the eventfd_signal function: in semaphore mode it should perform a wake-up with nr_exclusive equal to the value written.

So your options are:

  • Create a separate epoll descriptor for each thread (might not work with your design - scaling problems)
  • Put a mutex around it (scaling problems)
  • Use poll, probably on both eventfd and epoll
  • Wake each thread separately by writing 1 with evenfd_write 4 times (probably the best you can do).
like image 200
StaceyGirl Avatar answered Sep 25 '22 19:09

StaceyGirl