Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python s3 boto connection.close causes an error

I have code that writes files to s3. The code was working fine

    conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(BUCKET, validate=False)
    k = Key(bucket)
    k.key = self.filekey 
    k.set_metadata('Content-Type', 'text/javascript')
    k.set_contents_from_string(json.dumps(self.output))
    k.set_acl(FILE_ACL)

This was working just fine. Then I noticed I wasn't closing my connection so I added this line at the end:

    conn.close()

Now, the file writes as before but, I'm seeing this error in my logs now

    S3Connection instance has no attribute '_cache', unable to write file 

Anyone see what I'm doing wrong here or know what's causing this? I noticed that none of the tutorials on boto show people closing connections but I know you should close your connections for IO operations as a general rule...

EDIT A note about this, when I comment out conn.close() the error disappears

like image 310
Brad Avatar asked Aug 22 '13 14:08

Brad


1 Answers

I can't find that error message in the latest boto source code, so unfortunately I can't tell you what caused it. Recently, we had problems when we were NOT calling conn.close(), so there definitely is at least one case where you must close the connection. Here's my understanding of what's going on:

S3Connection (well, its parent class) handles almost all connectivity details transparently, and you shouldn't have to think about closing resource, reconnecting, etc.. This is why most tutorials and docs don't mention closing resources. In fact, I only know of one situation where you should close resources explicitly, which I describe at the bottom. Read on!

Under the covers, boto uses httplib. This client library supports HTTP 1.1 Keep-Alive, so it can and should keep the socket open so that it can perform multiple requests over the same connection.

AWS will close your connection (socket) for two reasons:

  1. According to the boto source code, "AWS starts timing things out after three minutes." Presumably "things" means "idle connections."
  2. According to Best Practices for Using Amazon S3, "S3 will accept up to 100 requests before it closes a connection (resulting in 'connection reset')."

Fortunately, boto works around the first case by recycling stale connections well before three minutes are up. Unfortunately, boto doesn't handle the second case quite so transparently:

When AWS closes a connection, your end of the connection goes into CLOSE_WAIT, which means that the socket is waiting for the application to execute close(). S3Connection handles connectivity details so transparently that you cannot actually do this directly! It's best to prevent it from happening in the first place.

So, circling back to the original question of when you need to close explicitly, if your application runs for a long time, keeps a reference to (reuses) a boto connection for a long time, and makes many boto S3 requests over that connection (thus triggering a "connection reset" on the socket by AWS), then you may find that more and more sockets are in CLOSE_WAIT. You can check for this condition on linux by calling netstat | grep CLOSE_WAIT. To prevent this, make an explicit call to boto's connection.close before you've made 100 requests. We make hundreds of thousands of S3 requests in a long running process, and we call connection.close after every, say, 80 requests.

like image 94
jtoberon Avatar answered Sep 30 '22 17:09

jtoberon