Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

s3 concurrent writes

I think I'm having a problem with concurrent s3 writes. Two (or more) processes are writing almost the same content to the same s3 location at the same time. I'd like to determine the concurrency rules that govern how this situation will play out.

By design, all of the processes but one will get killed while writing to s3. (I had said they are writing "almost" the same content because all but one of the processes are getting killed. If all processes were allowed to live, they would end up writing the same exact content.)

My theory is that the process getting killed is leaving an incomplete file on s3, and the other file (which presumably was written fully) is not being chosen as the one that gets to live on s3. I'd like to prove or disprove this theory. (I'm trying to find out if the issues are caused by concurrency issues during write to s3, or some other time).

From the FAQ at http://aws.amazon.com/s3/faqs/ :

Q: What data consistency model does Amazon S3 employ?

Amazon S3 buckets in the US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney) and South America (Sao Paulo) Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. Amazon S3 buckets in the US Standard Region provide eventual consistency.

I'm using the US Standard Region.

  • What does this answer say about concurrent writes? I think I understand the difference between "read-after-write consistency" vs "eventual consistency", but only in the context of what one sees when reading the object just after the write completes.
  • Is it possible for the killed process to "win" and therefore end up with an incomplete file on s3? Or does s3 somehow ensure that the file will only get placed on s3 if the whole PUT operation completes?
  • How does s3 decide which file "wins"? This is the real question here.
like image 327
Eddified Avatar asked Jan 30 '13 00:01

Eddified


People also ask

Does S3 support concurrent writes?

Like the above, we can have a scenario where there are simultaneous writes. Meaning that before process one finishes writing the object, process two starts writing to that object as well. This is what we call concurrent writes. In this scenario, S3 uses last-write wins semantics.

Is S3 strongly consistent or eventually consistent?

S3 also provides strong consistency for list operations, so after a write, you can immediately perform a listing of the objects in a bucket with any changes reflected.

How many S3 buckets can I have per account by default?

By default, you can create up to 100 buckets in each of your AWS accounts. If you need additional buckets, you can increase your account bucket limit to a maximum of 1,000 buckets by submitting a service limit increase.

Does S3 provide read after write consistency for all regions?

Unlike other cloud providers, Amazon S3 delivers strong read-after-write consistency for any storage request, without changes to performance or availability, without sacrificing regional isolation for applications, and at no additional cost.


1 Answers

I don't think that the consistency statements in that FAQ entry say anything about what will happen during concurrent writes to the same key.

However, it is not possible to have an incomplete file in S3: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html says

Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket.

This implies that only the file that is completely uploaded will exist at the specified key, but I suppose it's possible that such concurrent writes might tickle some error condition that result in no file being successfully uploaded. I'd do some testing to be sure; you might also wish to try using object versioning while you're at it and see if that behaves differently.

like image 70
mpierce Avatar answered Sep 17 '22 20:09

mpierce