The problem: I need to send many HTTP requests to a server. I can only use one connection (non-negotiable server limit). The server's response time plus the network latency is too high – I'm falling behind.
The requests typically don't change server state and don't depend on the previous request's response. So my idea is to simply send them on top of each other, enqueue the response objects, and depend on the Content-Length: of the incoming responses to feed incoming replies to the next-waiting response object. In other words: Pipeline the requests to the server.
This is of course not entirely safe (any reply without Content-Length: means trouble), but I don't care -- in that case I can always retry any queued requests. (The safe way would be to wait for the header before sending the next bit. That'd might help me enough. No way to test beforehand.)
So, ideally I want the following client code (which uses client delays to mimic network latency) to run in three seconds.
Now for the $64000 question: Is there a Python library which already does this, or do I need to roll my own? My code uses gevent; I could use Twisted if necessary, but Twisted's standard connection pool does not support pipelined requests. I also could write a wrapper for some C library if necessary, but I'd prefer native code.
#!/usr/bin/python
import gevent.pool
from gevent import sleep
from time import time
from geventhttpclient import HTTPClient
url = 'http://local_server/100k_of_lorem_ipsum.txt'
http = HTTPClient.from_url(url, concurrency=1)
def get_it(http):
print time(),"Queueing request"
response = http.get(url)
print time(),"Expect header data"
# Do something with the header, just to make sure that it has arrived
# (the greenlet should block until then)
assert response.status_code == 200
assert response["content-length"] > 0
for h in response.items():
pass
print time(),"Wait before reading body data"
# Now I can read the body. The library should send at
# least one new HTTP request during this time.
sleep(2)
print time(),"Reading body data"
while response.read(10000):
pass
print time(),"Processing my response"
# The next request should definitely be transmitted NOW.
sleep(1)
print time(),"Done"
# Run parallel requests
pool = gevent.pool.Pool(3)
for i in range(3):
pool.spawn(get_it, http)
pool.join()
http.close()
Dugong is an HTTP/1.1-only client which claims to support real HTTP/1.1 pipelining. The tutorial includes several examples on how to use it, including one using threads and another using asyncio.
Be sure to verify that the server you're communicating with actually supports HTTP/1.1 pipelining—some servers claim to support HTTP/1.1 but don't implement pipelining.
I think txrequests could get you most of what you are looking for, using the background_callback to en-queue processing of responses on a separate thread. Each request would still be it's own thread but using a session means by default it would reuse the same connection.
https://github.com/tardyp/txrequests#working-in-the-background
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With