Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python HTTP client with request pipelining

Tags:

The problem: I need to send many HTTP requests to a server. I can only use one connection (non-negotiable server limit). The server's response time plus the network latency is too high – I'm falling behind.

The requests typically don't change server state and don't depend on the previous request's response. So my idea is to simply send them on top of each other, enqueue the response objects, and depend on the Content-Length: of the incoming responses to feed incoming replies to the next-waiting response object. In other words: Pipeline the requests to the server.

This is of course not entirely safe (any reply without Content-Length: means trouble), but I don't care -- in that case I can always retry any queued requests. (The safe way would be to wait for the header before sending the next bit. That'd might help me enough. No way to test beforehand.)

So, ideally I want the following client code (which uses client delays to mimic network latency) to run in three seconds.

Now for the $64000 question: Is there a Python library which already does this, or do I need to roll my own? My code uses gevent; I could use Twisted if necessary, but Twisted's standard connection pool does not support pipelined requests. I also could write a wrapper for some C library if necessary, but I'd prefer native code.

#!/usr/bin/python

import gevent.pool
from gevent import sleep
from time import time

from geventhttpclient import HTTPClient

url = 'http://local_server/100k_of_lorem_ipsum.txt'
http = HTTPClient.from_url(url, concurrency=1)

def get_it(http):
    print time(),"Queueing request"
    response = http.get(url)
    print time(),"Expect header data"
    # Do something with the header, just to make sure that it has arrived
    # (the greenlet should block until then)
    assert response.status_code == 200
    assert response["content-length"] > 0
    for h in response.items():
        pass

    print time(),"Wait before reading body data"
    # Now I can read the body. The library should send at
    # least one new HTTP request during this time.
    sleep(2)
    print time(),"Reading body data"
    while response.read(10000):
        pass
    print time(),"Processing my response"
    # The next request should definitely be transmitted NOW.
    sleep(1)
    print time(),"Done"

# Run parallel requests
pool = gevent.pool.Pool(3)
for i in range(3):
    pool.spawn(get_it, http)

pool.join()
http.close()
like image 559
Matthias Urlichs Avatar asked Oct 11 '13 07:10

Matthias Urlichs


2 Answers

Dugong is an HTTP/1.1-only client which claims to support real HTTP/1.1 pipelining. The tutorial includes several examples on how to use it, including one using threads and another using asyncio.

Be sure to verify that the server you're communicating with actually supports HTTP/1.1 pipelining—some servers claim to support HTTP/1.1 but don't implement pipelining.

like image 132
taleinat Avatar answered Sep 18 '22 12:09

taleinat


I think txrequests could get you most of what you are looking for, using the background_callback to en-queue processing of responses on a separate thread. Each request would still be it's own thread but using a session means by default it would reuse the same connection.

https://github.com/tardyp/txrequests#working-in-the-background

like image 29
Amos Baker Avatar answered Sep 19 '22 12:09

Amos Baker