Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

epoll order of events from epoll_wait

I have ported a program over to epoll from select to increase the number of sockets we can handle. I have added the sockets to the epoll FD and can read and write happily.

However, I am concerned about potential starvation of sockets even though I am using level triggered events. The scenario I am worried about is when there are more sockets ready than epoll_event structures. I know that the next time I call epoll_wait it will give me the rest of them, but I wonder what order I get them in with reguards to who didn't make the cut the last time vs this time.

An example: Say I have 10 sockets connected and added to the epoll fd. I only have enough memory for 5 epoll_event structures. Let's assume that in the time between each epoll_wait, all 10 sockets receive data. The first epoll_wait will return 5 epoll_event structures for processing, lets say it's sockets 1-5. I process those 5 sockets and while I am doing so, more data comes in and all 10 sockets have more data to be read. I enter the epoll_wait again and get 5 more epoll_event structures.

My question is what 5 sockets will I get on the second call to epoll_wait. Will it be sockets 1-5 because they were added to the epoll FD first? Or will I get sockets 6-10 because those events were raised before more data came in on sockets 1-5?

Essentially, is epoll_wait like a FIFO queue or does it simply scan an internal list of sockets (and thereby favoring the first sockets in the list).

EDIT: This is Linux kernel v4.9.62

like image 768
Mr. Rogers Avatar asked Jan 28 '23 02:01

Mr. Rogers


2 Answers

The observation by @jxh about the behavior is correct, and the behavior is long established (and was originally intended, if I correctly recall my email conversations with the implementer, Davide Libenzi, many years ago). It's unfortunate that it has not been documented so far. But, I've fixed that for the upcoming manual pages release, where epoll_wait(2) will carry the text:

If more than maxevents file descriptors are ready when epoll_wait() is called, then successive epoll_wait() calls will round robin through the set of ready file descriptors. This behavior helps avoid starvation scenarios, where a process fails to notice that additional file descriptors are ready because it focuses on a set of file descriptors that are already known to be ready.

like image 76
mtk Avatar answered Jan 31 '23 20:01

mtk


Perusing through the source file for epoll, one sees that the ready events are maintained in a linked list. Events are removed from the head of the list and added to the end of the list.

Based on that, the answer is that the descriptor order is based on the order in which they became ready.

like image 42
jxh Avatar answered Jan 31 '23 19:01

jxh