http request with timeout, maximum size and connection pooling

Tags:

I'm looking for a way in Python (2.7) to do HTTP requests with 3 requirements:

timeout (for reliability)
content maximum size (for security)
connection pooling (for performance)

I've checked quite every python HTTP librairies, but none of them meet my requirements. For instance:

urllib2: good, but no pooling

import urllib2
import json

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100+1)
if len(content) > 100: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100000+1)
if len(content) > 100000: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

requests: no max size

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)
r.headers['content-length'] # does not exists for this request, and not safe
content = r.raw.read(100000+1)
print content # ARF this is gzipped, so not the real size
print json.loads(content) # content is gzipped so pretty useless
print r.json() # Does not work anymore since raw.read was used

urllib3: never got the "read" method working, even with a 50Mo file ...

httplib: httplib.HTTPConnection is not a pool (only one connection)

I can hardly belive that urllib2 is the best HTTP library I can use ! So if anyone knows what librairy can do this or how to use one of the previous librairy ...

EDIT:

The best solution I found thanks to Martijn Pieters (StringIO does not slow down even for huge files, where str addition does a lot).

r = requests.get('https://github.com/timeline.json', stream=True)
size = 0
ctt = StringIO()


for chunk in r.iter_content(2048):
    size += len(chunk)
    ctt.write(chunk)
    if size > maxsize:
        r.close()
        raise ValueError('Response too large')

content = ctt.getvalue()

569

asked May 07 '14 09:05

Aurélien Lambert

1 Answers

You can do it with requests just fine; but you need to know that the raw object is part of the urllib3 guts and make use of the extra arguments the HTTPResponse.read() call supports, which lets you specify you want to read decoded data:

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

content = r.raw.read(100000+1, decode_content=True)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

Alternatively, you can set the decode_content flag on the raw object before reading:

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

r.raw.decode_content = True
content = r.raw.read(100000+1)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

If you don't like reaching into urllib3 guts like that, use the response.iter_content() to iterate over the decoded content in chunks; this uses the underlying HTTPResponse too (using the .stream() generator version:

import requests

r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

maxsize = 100000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

print content
print json.loads(content)

There is of subtle difference here in how compressed data sizes are handled here; r.raw.read(100000+1) will only ever read 100k bytes of compressed data; the uncompressed data is tested against your max size. The iter_content() method will read more uncompressed data in the rare case the compressed stream is larger than the uncompressed data.

Neither method allows r.json() to work; the response._content attribute isn't set by these; you can do so manually of course. But since the .raw.read() and .iter_content() calls already give you access to the content in question, there is really no need.

166

answered Oct 18 '22 04:10

Martijn Pieters

Related questions
                            
                                Pandas dataframe: Check if data is monotonically decreasing
                            
                                Python: EOFError: EOF when reading a line
                            
                                How to write custom django manage.py commands in multiple apps
                            
                                PySide Import Error on Ubuntu 13.04
                            
                                Can't print character '\u2019' in Python from JSON object
                            
                                How to plot pcolor colorbar in a different subplot - matplotlib
                            
                                Boto s3 get_metadata
                            
                                How can I concatenate a Series onto a DataFrame with Pandas?
                            
                                How do I make `python setup.py test -q` quieter?
                            
                                Python multiprocessing and independence of children processes
                            
                                Timezone Information Missing in pytz?
                            
                                PyCharm SQLAlchemy autocomplete
                            
                                Changing what the ends of whiskers represent in matplotlib's boxplot function
                            
                                Streaming data from Postgres into Python
                            
                                Explaining the differences between dim, shape, rank, dimension and axis in numpy
                            
                                AttributeError: 'module' object has no attribute 'python_implementation' running pip
                            
                                Python - Check if list of lists of lists contains a specific list
                            
                                Very slow regular expression search
                            
                                How do you set up Pycharm to debug a Fabric fabfile on Windows?
                            
                                Converting string to date object without time info

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

http request with timeout, maximum size and connection pooling

Tags:

python

http

timeout

connection-pooling

max-size

Aurélien Lambert

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us