I have ported a program over to epoll from select to increase the number of sockets we can handle. I have added the sockets to the epoll FD and can read and write happily.
However, I am concerned about potential starvation of sockets even though I am using level triggered events. The scenario I am worried about is when there are more sockets ready than epoll_event
structures. I know that the next time I call epoll_wait
it will give me the rest of them, but I wonder what order I get them in with reguards to who didn't make the cut the last time vs this time.
An example:
Say I have 10 sockets connected and added to the epoll
fd. I only have enough memory for 5 epoll_event
structures. Let's assume that in the time between each epoll_wait
, all 10 sockets receive data. The first epoll_wait
will return 5 epoll_event
structures for processing, lets say it's sockets 1-5. I process those 5 sockets and while I am doing so, more data comes in and all 10 sockets have more data to be read. I enter the epoll_wait
again and get 5 more epoll_event
structures.
My question is what 5 sockets will I get on the second call to epoll_wait
. Will it be sockets 1-5 because they were added to the epoll
FD first? Or will I get sockets 6-10 because those events were raised before more data came in on sockets 1-5?
Essentially, is epoll_wait
like a FIFO queue or does it simply scan an internal list of sockets (and thereby favoring the first sockets in the list).
EDIT: This is Linux kernel v4.9.62
The observation by @jxh about the behavior is correct, and the behavior is long established (and was originally intended, if I correctly recall my email conversations with the implementer, Davide Libenzi, many years ago). It's unfortunate that it has not been documented so far. But, I've fixed that for the upcoming manual pages release, where epoll_wait(2) will carry the text:
If more than maxevents file descriptors are ready when
epoll_wait()
is called, then successiveepoll_wait()
calls will round robin through the set of ready file descriptors. This behavior helps avoid starvation scenarios, where a process fails to notice that additional file descriptors are ready because it focuses on a set of file descriptors that are already known to be ready.
Perusing through the source file for epoll, one sees that the ready events are maintained in a linked list. Events are removed from the head of the list and added to the end of the list.
Based on that, the answer is that the descriptor order is based on the order in which they became ready.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With