I have several daemons that read many files from Amazon S3 using boto. Once every couple of days, I'm running into a situation where an httplib.IncompleteRead is thrown out from deep inside boto. If I try and retry the request, it immediately fails with another IncompleteRead. Even if I call bucket.connection.close()
, all further requests will still error out.
I feel like I might've stumbled across a bug in boto here, but nobody else seems to have hit it. Am I doing something wrong? All of the daemons are single-threaded, and I've tried setting is_secure
both ways.
Traceback (most recent call last):
...
File "<file_wrapper.py",> line 22, in next
line = self.readline()
File "<file_wrapper.py",> line 37, in readline
data = self.fh.read(self.buffer_size)
File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 378, in read
self.close()
File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 349, in close
self.resp.read()
File "<virtualenv/lib/python2.6/site-packages/boto/connection.py",> line 411, in read
self._cached_response = httplib.HTTPResponse.read(self)
File "/usr/lib/python2.6/httplib.py", line 529, in read
s = self._safe_read(self.length)
File "/usr/lib/python2.6/httplib.py", line 621, in _safe_read
raise IncompleteRead(''.join(s), amt)
Environment:
I've been struggling with this problem for a while, running long-running processes which read large amount of data from S3. I decided to post my solution here, for posterity.
First of all, I'm sure the hack pointed to by @Glenn works, but I chose not to use it because I consider it intrusive (hacking httplib) and unsafe (it blindly returns what it got, i.e. return e.partial
, despite the fact it can be real-error-case).
Here is the solution I finally came up with, which seems to be working.
I'm using this general-purpose retrying function:
import time, logging, httplib, socket
def run_with_retries(func, num_retries, sleep = None, exception_types = Exception, on_retry = None):
for i in range(num_retries):
try:
return func() # call the function
except exception_types, e:
# failed on the known exception
if i == num_retries - 1:
raise # this was the last attempt. reraise
logging.warning(f'operation {func} failed with error {e}. will retry {num_retries-i-1} more times')
if on_retry is not None:
on_retry()
if sleep is not None:
time.sleep(sleep)
assert 0 # should not reach this point
Now, when reading a file from S3, I'm using this function, which internally performs retries in case of IncompleteRead
errors. Upon an error, before retrying, I call key.close()
.
def read_s3_file(key):
"""
Reads the entire contents of a file on S3.
@param key: a boto.s3.key.Key instance
"""
return run_with_retries(
key.read, num_retries = 3, sleep = 0.5,
exception_types = (httplib.IncompleteRead, socket.error),
# close the connection before retrying
on_retry = lambda: key.close()
)
It may well be a bug in boto, but the symptoms you describe are not unique to it. See
IncompleteRead using httplib
https://dev.twitter.com/discussions/9554
Since httplib appears in your traceback, one solution is proposed here:
http://bobrochel.blogspot.in/2010/11/bad-servers-chunked-encoding-and.html?showComment=1358777800048
Disclaimer: I have no experience with boto. This is based on research only and posted since there have been no other responses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With