I have an event oriented server which already uses select.epoll().
Now a new requirement should be solved: URLs should get fetched (async).
Up to now I always used the requests library, and I always used it synchronous, never asynchronous.
How can I use the requests library (or a different urllib) combined with linux epoll?
The requests library docs has a note about this, but there only async-frameworks are mentioned (not select.epoll()): http://docs.python-requests.org/en/master/user/advanced/#blocking-or-non-blocking
I am not married with select.epoll(). It worked up to now. I can use a different solution, if feasible.
Background: The bigger question is "Should I use select.epoll() or one of the many async frameworks which python has?". But questions at StackOverflow must not be too broad. That's why this question focuses on "Retrieve several URLs via select.epoll()". If you have hints to the bigger question, please leave a comment.
If you are curious, this question is needed for a small project which I develop in my spare time: https://github.com/guettli/ipo (IPO is an open source asynchronous job queue which is based on PostgreSQL.)
Unfortunately you can’t unless such a library has been built with this integration in mind. epoll, as well as select/poll/kqueue and others are I/O multiplexing system calls and the overall program architecture needs to be built around it.
Simply put, a typical program structure boils down to the following
After that this is the outer code’s job to handle these descriptors i.e. figure out how much data has become available, call some callbacks etc.
If the library uses regular blocking sockets the only way to parallelize it is to use threads/processes Here’s a good article on the subject, the examples use C and that’s good as it’s easier to understand what’s actually happening under the hood
Lets check out what’s suggested here
If you are concerned about the use of blocking IO, there are lots of projects out there that combine Requests with one of Python's asynchronicity frameworks. Some excellent examples are requests-threads, grequests, and requests-futures).
requests-threads - uses threads
grequests - integration with gevent (it’s a different story, see below)
requests-futures - in fact also threads/processes
neither of them has anything to do with true asynchronicity
Please note, epoll is linux-specific beast and it won’t work i.e. on OS X that has a different mechanism called kqueue. As you appear to be writing a general-purpose job queue it doesn’t seem to be a good solution.
Now back to python. You’ve got the following options:
threads/processes/concurrent.futures - unlikely is it something you’re aiming at as your app is a typical C10K server
epoll/kqueue - you’ll have to do everything yourself. In case of fetching an HTTP urls you’ll need to deal with not only http/ssl but also with asynchronous DNS resolution. Also consider using asyncore[] that provides some basic infrastructure
twisted/tornado - callback-based frameworks that already do all the low-level stuff for you
gevent - this is something you might like if you’re going to reuse existing blocking libraries (urllib, requests etc) and use both python 2.x and python 3.x. But this solution is a hack by design. For an app of your size it might be ok, but I wouldn’t use it for anything bigger that should be rock-solid and run in prod
asyncio
This module provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives
It has everything you might need. There’s also a bunch of libraries working with popular RDBMs and http https://github.com/aio-libs
But it lacks support of python 2.x. There are ports of asyncio to python 2.x but not sure how stable they are
So if I could sacrifice python 2.x I’d personally go with asyncio & related libraries
If you really really need python 2.x use one of the approaches above depending on the stability required and assumed peak load
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With