Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If I stream a file to s3, will the event trigger once the file is complete?

As title says, if I attached an event to an S3 bucket for put events and I stream a file to that bucket, will the event trigger once the upload starts? That way a receiver can start downloading stream for that file.

Or will the event be triggered after the file has been completed uploading?

like image 832
iCodeLikeImDrunk Avatar asked Jul 08 '15 15:07

iCodeLikeImDrunk


People also ask

How do I trigger AWS step functions after file uploaded to S3 bucket?

Navigate to the Step Functions console and select the state machine used in your Amazon EventBridge rule ( Helloworld ) . Select the most recent execution of that state machine and expand the Execution Input section. This input includes information such as the bucket name and the object name.

What happens if you upload the same file to S3?

If you pass the same key to upload a file, it is replaced, unless versioning is on. S3 supports versioning. This means that when you upload to the same key twice, two versions of the file are stored. Note that if you upload the exact same file twice, you get to pay for two identical copies of the same file on S3.

Can you stream data to S3?

You can set up the Kinesis Stream to S3 to start streaming your data to Amazon S3 buckets using the following steps: Step 1: Signing in to the AWS Console for Amazon Kinesis. Step 2: Configuring the Delivery Stream. Step 3: Transforming Records using a Lambda Function.


1 Answers

There are two problems with what you are contemplating:

  • The event does not fire until the upload is finished.

  • Writing objects into S3 is always an atomic operation. The write either completes successfully, or it doesn't happen at all... and until it completes successfully, the object does not actually exist in the bucket.

If you are writing a new object into a bucket, authorized requests for the object will return a 404 error at least until the upload completes successfully.

If you are overwriting an existing object, authorized requests for the object will always return the old copy of the object, unchanged and undamaged, at least until the overwrite completes successfully.

Note the use of "at least until," above.

In all regions, except US-Standard (us-east-1) uploads of new objects are generally available immediately after the upload. In US-Standard, there may sometimes be a brief delay. Formerly, the us-east-1 region of S3 (Northern Virginia, which was previously designated the "US Standard" region) did not offer immediate (read-after-write) consistency for new objects, but now it does.

However, there is a catch: the object must not have been requested before it was uploaded. If it is, the consistency model breaks.¹

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.

http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#Regions

In all regions, overwrites of existing objects may also encounter a brief delay (and delete objects may remain accessible for a brief time after deletion).

This potential delay is because of S3's eventual consistency model on some operations, as described above. S3 does not guarantee all operations will be immediately visible, though for practical purposes, they almost always are. What S3 does guarantee is that if your upload completes successfully, with a success response from S3, then your object is committed to the S3 backing store.

The above applies to PUT uploads, PUT/Copy, and multipart.

For these reasons, S3 cannot stream your file out to a consumer while the upload has not yet completed.

To do that requires a different solution (though S3 could of course be used as a permanent repository after the streaming is done).


¹the consistency model breaks. Almost certainly, this indicates that a request arriving at S3 consults a replica of the bucket index, and if the index has no knowledge of whether an object exists, it consults a more authoritative version of the index. If it still finds nothing, it locally "remembers" that the object doesn't exist, because that upstream lookup is a comparatively expensive operation -- thus it won't look upstream again on subsequent requests -- but once the creation of the new object propagates into the local index, the object would be available. The same theory explains the eventual consistency of overwrites and deletes.

like image 103
Michael - sqlbot Avatar answered Dec 12 '22 14:12

Michael - sqlbot