Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude multiple folders using AWS S3 sync

How to exclude multiple folders while using aws s3 syn ?

I tried :

    # aws s3 sync s3://inksedge-app-file-storage-bucket-prod-env \                    s3://inksedge-app-file-storage-bucket-test-env \                   --exclude 'reportTemplate/* orders/* customers/*' 

But still it's doing sync for folder "customer"

Output :

    copy: s3://inksedge-app-file-storage-bucket-prod-env/customers/116/miniimages/IMG_4800.jpg        to s3://inksedge-app-file-storage-bucket-test-env/customers/116/miniimages/IMG_4800.jpg      copy: s3://inksedge-app-file-storage-bucket-prod-env/customers/116/miniimages/DSC_0358.JPG        to s3://inksedge-app-file-storage-bucket-test-env/customers/116/miniimages/DSC_0358.JPG 
like image 687
Ashish Karpe Avatar asked Sep 04 '15 08:09

Ashish Karpe


People also ask

Does aws S3 sync create folders?

Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.

Does aws S3 sync overwrite?

It only copies files that have been added or changed since the last sync. It is designed as a one-way sync, not a two-way sync. Your file is being overwritten because the file in the Source is not present in the Destination. This is correct behavior.

What is _$ folder in S3?

The "_$folder$" files are placeholders. Apache Hadoop creates these files when you use the -mkdir command to create a folder in an S3 bucket. Hadoop doesn't create the folder until you PUT the first object. If you delete the "_$folder$" files before you PUT at least one object, Hadoop can't create the folder.

Should I use folders in S3?

Using "folders" has no performance impact on S3, either way. It doesn't make it faster, and it doesn't make it slower. The value of delimiting your object keys with / is in organization, both machine-friendly and human-friendly.


2 Answers

At last this worked for me:

aws s3 sync s3://my-bucket s3://my-other-bucket \             --exclude 'customers/*' \             --exclude 'orders/*' \             --exclude 'reportTemplate/*'   

Hint: you have to enclose your wildcards and special characters in single or double quotes to work properly. Below are examples of matching characters. for more information regarding S3 commands, check it in amazon here.

*: Matches everything ?: Matches any single character [sequence]: Matches any character in sequence [!sequence]: Matches any character not in sequence 
like image 102
Ashish Karpe Avatar answered Oct 01 '22 03:10

Ashish Karpe


For those who are looking for sync some subfolder in a bucket, the exclude filter applies to the files and folders inside the folder that is be syncing, and not the path with respect to the bucket, example:

aws s3 sync s3://bucket1/bootstrap/ s3://bucket2/bootstrap --exclude '*' --include 'css/*' 

would sync the folder bootstrap/css but not bootstrap/js neither bootstrap/fonts in the following folder tree:

bootstrap/ ├── css/ │   ├── bootstrap.css │   ├── bootstrap.min.css │   ├── bootstrap-theme.css │   └── bootstrap-theme.min.css ├── js/ │   ├── bootstrap.js │   └── bootstrap.min.js └── fonts/     ├── glyphicons-halflings-regular.eot     ├── glyphicons-halflings-regular.svg     ├── glyphicons-halflings-regular.ttf     └── glyphicons-halflings-regular.woff 

That is, the filter is 'css/*' and not 'bootstrap/css/*'

More in https://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters

like image 35
Raphael Fernandes Avatar answered Oct 01 '22 04:10

Raphael Fernandes