Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between poll and select?

Tags:

c

unix

select

The select() call has you create three bitmasks to mark which sockets and file descriptors you want to watch for reading, writing, and errors, and then the operating system marks which ones in fact have had some kind of activity; poll() has you create a list of descriptor IDs, and the operating system marks each of them with the kind of event that occurred.

The select() method is rather clunky and inefficient.

  1. There are typically more than a thousand potential file descriptors available to a process. If a long-running process has only a few descriptors open, but at least one of them has been assigned a high number, then the bitmask passed to select() has to be large enough to accomodate that highest descriptor — so whole ranges of hundreds of bits will be unset that the operating system has to loop across on every select() call just to discover that they are unset.

  2. Once select() returns, the caller has to loop over all three bitmasks to determine what events took place. In very many typical applications only one or two file descriptors will get new traffic at any given moment, yet all three bitmasks must be read all the way to the end to discover which descriptors those are.

  3. Because the operating system signals you about activity by rewriting the bitmasks, they are ruined and are no longer marked with the list of file descriptors you want to listen to. You either have to rebuild the whole bitmask from some other list that you keep in memory, or you have to keep a duplicate copy of each bitmask and memcpy() the block of data over on top of the ruined bitmasks after each select() call.

So the poll() approach works much better because you can keep re-using the same data structure.

In fact, poll() has inspired yet another mechanism in modern Linux kernels: epoll() which improves even more upon the mechanism to allow yet another leap in scalability, as today's servers often want to handle tens of thousands of connections at once. This is a good introduction to the effort:

http://scotdoyle.com/python-epoll-howto.html

While this link has some nice graphs showing the benefits of epoll() (you will note that select() is by this point considered so inefficient and old-fashioned that it does not even get a line on these graphs!):

http://lse.sourceforge.net/epoll/index.html


Update: Here is another Stack Overflow question, whose answer gives even more detail about the differences:

Caveats of select/poll vs. epoll reactors in Twisted


I think that this answers your question:

From Richard Stevens ([email protected]):

The basic difference is that select()'s fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed.

With poll(), however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll() than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select() and rarely poll(), because I port my code to BSD environments too. Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Both select() and poll() are being standardized by POSIX 1003.1g.

October 2017 Update:

The email referenced above is at least as old as 2001; the poll() command is now (2017) supported across all modern operating systems - including BSD. In fact, some people believe that select() should be deprecated. Opinions aside, portability issues around poll() are no longer a concern on modern systems. Furthermore, epoll() has since been developed (you can read the man page), and continues to rise in popularity.

For modern development you probably don't want to use select(), although there's nothing explicitly wrong with it. poll(), and it's more modern evolution epoll(), provide the same features (and more) as select() without suffering from the limitations therein.


Both of them are slow and mostly the same, But different in size and some kind of features!

When you write an iterator, You need to copy the set of select every time! While poll has fixed this kind of problem to have beautiful code. Another difference is that poll can handle more than 1024 file descriptors (FDs) by default. poll can handle different events to make the program more readable instead of having a lot of variables to handle this kind of job. Operations in poll and select is linear and slow because of having a lot of checks.