Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading from S3 with aws-cli using filter on specific prefix

For some reason there's a bucket with a bunch of different files, all of which have the same prefix but with different dates:

backup.2017-01-01aa

backup.2017-01-01ab

backup.2017-01-15aa

backup.2017-01-15ab

backup.2017-02-01aa

backup.2017-02-01ab

etc..

How do I download only files that start with "backup.2017-01-01"?

like image 937
ZN13 Avatar asked Jul 25 '17 10:07

ZN13


2 Answers

I think --include does the filtering locally. So if your bucket contains millions of files, the command can take hours to run, because it needs to download a list of all the filenames in the bucket. Also, some extra network traffic.

But aws s3 ls can take a truncated filename to list all the corresponding files, without any extra traffic. So you can

aws s3 ls s3://yourbucket/backup.2017-

to see your files, and something like

aws s3 ls s3://yourbucket/backup.2017- | colrm 1 31 | xargs -I % aws s3 cp s3://yourbucket/% .

to copy your files.

like image 131
Sampo Smolander Avatar answered Sep 20 '22 21:09

Sampo Smolander


You'll have to use aws s3 sync s3://yourbucket/

There are two parameters you can give to aws s3 sync; --exclude and --include, both of which can take the "*" wildcard.

First we'll have to --exclude "*" to exclude all of the files, and then we'll --include "backup.2017-01-01*" to include all the files we want with the specific prefix. Obviously you can change the include around so you could also do something like --include "*-01-01*".

That's it, here's the full command:

aws s3 sync s3://yourbucket/ . --exclude "*" --include "backup.2017-01-01*"

Also, remember to use --dryrun to test your command and avoid downloading all files in the bucket.

like image 25
ZN13 Avatar answered Sep 20 '22 21:09

ZN13