I'm using Requests to download a file (several gigabytes) from a server. To provide progress updates (and to prevent the entire file from having to be stored in memory) I've set stream=True
and wrote the download to a file:
with open('output', 'w') as f:
response = requests.get(url, stream=True)
if not response.ok:
print 'There was an error'
exit()
for block in response.iter_content(1024 * 100):
f.write(block)
completed_bytes += len(block)
write_progress(completed_bytes, total_bytes)
However, at some random point in the download, Requests throws a ChunkedEncodingError
. I've gone into the source and found that this corresponds to an IncompleteRead
exception. I inserted a log statement around those lines and found that e.partial = "\r"
. I know that the server gives the downloads low priority and I suspect that this exception occurs when the server waits too long to send the next chunk.
As is expected, the exception stops the download. Unfortunately, the server does not implement HTTP/1.1's content ranges, so I cannot simply resume it. I've played around with increasing urllib3's internal timeout, but the exception still persists.
Is there anyway to make the underlying urllib3 (or Requests) more tolerant of these empty (or late) chunks so that the file can completely download?
import httplib
def patch_http_response_read(func):
def inner(*args):
try:
return func(*args)
except httplib.IncompleteRead, e:
return e.partial
return inner
httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)
I can not reproduce your problem right now, but I think this could be a patch. It allows you to deal with defective http servers.
Most bad servers transmit all data, but due implementation errors they wrongly close session and httplib raise error and bury your precious bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With