Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

downloading from AWS S3 while file is being updated

This may seem like a really basic question, but if I am downloading a file from S3 while it is being updated by another process, do I have to worry about getting an incomplete file?

Example: a 200MB CSV file. User A starts to update the file with 200MB of new content at 1Mbps. 16 seconds later, User B starts download the file at 200Mbps. Does User B get all 200MB of the original file, or does User B get ~2MB of User A's changes and nothing else?

like image 490
Jason Priebe Avatar asked Mar 03 '16 19:03

Jason Priebe


1 Answers

User B gets all 200MB of the original file.

Here's why:

PUT operations on S3 are atomic. There's technically no such thing as "modifying" an object. What actually happens when an object is overwritten is that the object is replaced with another object having the same key. But the original object is not actually replaced until the new (overwriting) object is uploaded in its entirety, and successfully...and even then, the overwritten object is not technically "gone" yet -- it's only been replaced in the bucket's index, so that future requests will be served the new object.

(Serving the new object is actually documented as not being guaranteed to always happen immediately. In contrast with uploads of new objects, which are immediately available for download, overwrites of existing objects are eventually consistent, meaning that it's possible -- however unlikely -- that for a short period of time after you upload an object that the old copy could still be served up for subsequent requests).

But when you overwrite an object, and versioning is not enabled on the bucket, the old object and new objects are actually stored independently in S3, in spite of the same key. The old object is now no longer referenced by the bucket's index, so you are no longer billed for storage of it, and it will shortly be purged from S3's backing store. It's not actually documented how much later this happens... but (tl;dr) overwriting an object that is currently being downloaded should not cause any unexpected side effects.

Updates to a single key are atomic. For example, if you PUT to an existing key, a subsequent read might return the old data or the updated data, but it will never write corrupted or partial data.

http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel

like image 142
Michael - sqlbot Avatar answered Sep 23 '22 13:09

Michael - sqlbot