I have seen a lot of comparisons which says select have to walk through the fd list, and this is slow. But why doesn't epoll have to do this?

There's a lot of misinformation about this, but the real reason is this: A typical server might be dealing with, say, 200 connections. It will service every connection that needs to have data written or read and then it will need to wait until there's more work to do. While it's waiting, it needs to be interrupted if data is received on any of those 200 connections. With <code>select</code>, the kernel has to add the process to 200 wait lists, one for each connection. To do this, it needs a "thunk" to attach the process to the wait list. When the process finally does wake up, it needs to be removed from all 200 wait lists and all those thunks need to be freed. By contrast, with <code>epoll</code>, the <code>epoll</code> socket itself has a wait list. The process needs to be put on only that one wait list using only one thunk. When the process wakes up, it needs to be removed from only one wait list and only one thunk needs to be freed. To be clear, with <code>epoll</code>, the <code>epoll</code> socket itself has to be attached to each of those 200 connections. But this is done once, for each connection, when it is accepted in the first place. And this is torn down once, for each connection, when it is removed. By contrast, each call to <code>select</code> that blocks must add the process to every wait queue for every socket being monitored. Ironically, with <code>select</code>, the largest cost comes from checking if sockets that have had no activity have had any activity. With <code>epoll</code>, there is no need to check sockets that have had no activity because if they did have activity, they would have informed the <code>epoll</code> socket when that activity happened. In a sense, <code>select</code> polls each socket each time you call <code>select</code> to see if there's any activity while <code>epoll</code> rigs it so that the socket activity itself notifies the process.

The main difference between <code>epoll</code> and <code>select</code> is that in <code>select()</code> the list of file descriptors to wait on only exists for the duration of a single <code>select()</code> call, and the calling task only stays on the sockets' wait queues for the duration of a single call. In <code>epoll</code>, on the other hand, you create a single file descriptor that aggregates events from multiple other file descriptors you want to wait on, and so the list of monitored fd's is long-lasting, and tasks stay on socket wait queues across multiple system calls. Furthermore, since an <code>epoll</code> fd can be shared across multiple tasks, it is no longer a single task on the wait queue, but a structure that itself contains another wait queue, containing all processes currently waiting on the <code>epoll</code> fd. (In terms of implementation, this is abstracted over by the sockets' wait queues holding a function pointer and a <code>void*</code> data pointer to pass to that function). So, to explain the mechanics a little more: <ol> <li>An <code>epoll</code> file descriptor has a private <code>struct eventpoll</code> that keeps track of which fd's are attached to this fd. <code>struct eventpoll</code> also has a wait queue that keeps track of all processes that are currently <code>epoll_wait</code>ing on this fd. <code>struct epoll</code> also has a list of all file descriptors that are currently available for reading or writing.</li> <li>When you add a file descriptor to an <code>epoll</code> fd using <code>epoll_ctl()</code>, <code>epoll</code> adds the <code>struct eventpoll</code> to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.</li> <li>When you wait on an <code>epoll</code> fd using <code>epoll_wait</code>, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside <code>struct eventpoll</code>, and goes to sleep.</li> <li>When an event occurs on a socket that is being <code>epoll()</code>ed, it calls the <code>epoll</code> callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that <code>struct eventpoll</code>.</li> </ol> Obviously, a lot of careful locking is needed on <code>struct eventpoll</code> and the various lists and wait queues, but that's an implementation detail. The important thing to note is that at no point above there did I describe a step that loops over all file descriptors of interest. By being entirely event-based and by using a long-lasting set of fd's and a ready list, epoll can avoid ever taking O(n) time for an operation, where n is the number of file descriptors being monitored.

Why is epoll faster than select?

Video Answer

2 Answers

There's a lot of misinformation about this, but the real reason is this:

A typical server might be dealing with, say, 200 connections. It will service every connection that needs to have data written or read and then it will need to wait until there's more work to do. While it's waiting, it needs to be interrupted if data is received on any of those 200 connections.

With select, the kernel has to add the process to 200 wait lists, one for each connection. To do this, it needs a "thunk" to attach the process to the wait list. When the process finally does wake up, it needs to be removed from all 200 wait lists and all those thunks need to be freed.

By contrast, with epoll, the epoll socket itself has a wait list. The process needs to be put on only that one wait list using only one thunk. When the process wakes up, it needs to be removed from only one wait list and only one thunk needs to be freed.

To be clear, with epoll, the epoll socket itself has to be attached to each of those 200 connections. But this is done once, for each connection, when it is accepted in the first place. And this is torn down once, for each connection, when it is removed. By contrast, each call to select that blocks must add the process to every wait queue for every socket being monitored.

Ironically, with select, the largest cost comes from checking if sockets that have had no activity have had any activity. With epoll, there is no need to check sockets that have had no activity because if they did have activity, they would have informed the epoll socket when that activity happened. In a sense, select polls each socket each time you call select to see if there's any activity while epoll rigs it so that the socket activity itself notifies the process.

158

answered Sep 22 '22 02:09

David Schwartz

The main difference between epoll and select is that in select() the list of file descriptors to wait on only exists for the duration of a single select() call, and the calling task only stays on the sockets' wait queues for the duration of a single call. In epoll, on the other hand, you create a single file descriptor that aggregates events from multiple other file descriptors you want to wait on, and so the list of monitored fd's is long-lasting, and tasks stay on socket wait queues across multiple system calls. Furthermore, since an epoll fd can be shared across multiple tasks, it is no longer a single task on the wait queue, but a structure that itself contains another wait queue, containing all processes currently waiting on the epoll fd. (In terms of implementation, this is abstracted over by the sockets' wait queues holding a function pointer and a void* data pointer to pass to that function).

So, to explain the mechanics a little more:

An epoll file descriptor has a private struct eventpoll that keeps track of which fd's are attached to this fd. struct eventpoll also has a wait queue that keeps track of all processes that are currently epoll_waiting on this fd. struct epoll also has a list of all file descriptors that are currently available for reading or writing.
When you add a file descriptor to an epoll fd using epoll_ctl(), epoll adds the struct eventpoll to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.
When you wait on an epoll fd using epoll_wait, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside struct eventpoll, and goes to sleep.
When an event occurs on a socket that is being epoll()ed, it calls the epoll callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that struct eventpoll.

Obviously, a lot of careful locking is needed on struct eventpoll and the various lists and wait queues, but that's an implementation detail.

The important thing to note is that at no point above there did I describe a step that loops over all file descriptors of interest. By being entirely event-based and by using a long-lasting set of fd's and a ready list, epoll can avoid ever taking O(n) time for an operation, where n is the number of file descriptors being monitored.

answered Sep 23 '22 02:09

elite21

Related questions
                            
                                Can select * usage ever be justified?
                            
                                Convert Time DataType into AM PM Format:
                            
                                Select records from today, this week, this month php mysql
                            
                                jQuery add blank option to top of list and make selected to existing dropdown
                            
                                MySQL error 1241: Operand should contain 1 column(s)
                            
                                SQL User Defined Function Within Select
                            
                                mysql select from n last rows
                            
                                set option "selected" attribute from dynamic created option
                            
                                Click event on select option element in chrome
                            
                                JQuery - Get select value
                            
                                I want to vertical-align text in select box
                            
                                SQL SELECT from multiple tables
                            
                                Using index, using temporary, using filesort - how to fix this?
                            
                                MySQL INSERT INTO ... VALUES and SELECT
                            
                                Postgresql column reference "id" is ambiguous
                            
                                How to remove duplicates, which are generated with array_agg postgres function
                            
                                MySQL How do you INSERT INTO a table with a SELECT subquery returning multiple rows?
                            
                                How to select min and max values of a column in a datatable?
                            
                                Select a complete table with Javascript (to be copied to clipboard)
                            
                                How do I find records that are not joined?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is epoll faster than select?

Tags:

select

epoll

amazingjxq

People also ask

Video Answer

2 Answers

David Schwartz

elite21

Recent Activity

Donate For Us