Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does `aws s3 sync` determine if a file has been updated?

When I run the command in the terminal back to back, it doesn't sync the second time. Which is great! It shouldn't. But, if I run my build process and run aws s3 sync programmatically, back to back, it syncs all the files both times, as if my build process is changing something differently the second time.

Can't figure out what might be happening. Any ideas?

My build process is basically pug source/ --out static-site/ and stylus -c styles/ --out static-site/styles/

like image 855
Costa Michailidis Avatar asked Apr 20 '17 21:04

Costa Michailidis


People also ask

How does aws S3 sync work?

The s3 sync command synchronizes the contents of a bucket and a directory, or the contents of two buckets. Typically, s3 sync copies missing or outdated files or objects between the source and target.

Does aws S3 sync overwrite?

It only copies files that have been added or changed since the last sync. It is designed as a one-way sync, not a two-way sync. Your file is being overwritten because the file in the Source is not present in the Destination. This is correct behavior.

Can S3 files be updated?

S3 does not have a concept of updating existing files, you can only overwrite an existing file. When this overwrite happens, S3 considers it as a new file object, or a new version of the file, and that file object gets its own unique version ID.

Is aws S3 sync recursive?

Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.


1 Answers

AWS CLI sync:

A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.

--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.

You want the --size-only option which looks only at the file size not the last modified date. This is perfect for an asset build system that will change the last modified date frequently but not the actual contents of the files (I'm running into this with webpack builds where things like fonts kept syncing even though the file contents were identical). If you don't use a build method that incorporates the hash of the contents into the filename it might be possible to run into problems (if build emits same sized file but with different contents) so watch out for that.

I did manually test adding a new file that wasn't on the remote bucket and it is indeed added to the remote bucket with --size-only.

like image 159
Cymen Avatar answered Oct 10 '22 07:10

Cymen