I am trying to copy data from s3 to local with prefix using aws-cli.
But I am getting error with different regex.
aws s3 cp s3://my-bucket-name/RAW_TIMESTAMP_0506* . --profile prod
error:
no matches found: s3://my-bucket-name/RAW_TIMESTAMP_0506*
To copy all objects in an S3 bucket to your local machine simply use the aws s3 cp command with the --recursive option. For example aws s3 cp s3://big-datums-tmp/ ./ --recursive will copy all files from the “big-datums-tmp” bucket to the current working directory on your local machine.
A key prefix is a string of characters that can be the complete path in front of the object name (including the bucket name). For example, if an object (123. txt) is stored as BucketName/Project/WordFiles/123. txt, the prefix might be “BucketName/Project/WordFiles/123.
You can use prefixes to organize the data that you store in Amazon S3 buckets. A prefix is a string of characters at the beginning of the object key name. A prefix can be any length, subject to the maximum length of the object key name (1,024 bytes).
The s3 sync command copies the objects from the local folder to the destination bucket, if: the size of the objects differs. the last modified time of the source is newer than the last modified time of the destination.
aws s3 cp s3://my-bucket/ <local directory path> --recursive --exclude "*" --include "<prefix>*"
This will copy only files with given prefix
The above answers to not work properly... for example I have many thousands of files in a directory by date, and I wish to retrieve only the files that are needed.. so I tried the correct version per the documents:
aws s3 cp s3://mybucket/sub /my/local/ --recursive --exclude "*" --include "20170906*.png"
and it did not download the prefixed files, but began to download everything
so then I tried the sample above:
aws s3 cp s3://mybucket/sub/ . /my/local --recursive --include "20170906*"
and it also downloaded everything... It seems that this is an ongoing issue with aws cli, and they have no intention to fix it... Here are some workarounds that I found while Googling, but they are less than ideal.
https://github.com/aws/aws-cli/issues/1454
The aws s3 cp
command will not accept a wildcard as part of the filename (key). Instead, you must use the --include
and --exclude
parameters to define filenames.
From: Use of Exclude and Include Filters
Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have
--exclude "<value>"
and--include "<value>"
parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported.
So, you would use something like:
aws s3 cp s3://my-bucket-name/ . --include "RAW_TIMESTAMP_0506*"
If you don't like silent consoles, you can pipe aws ls
thru awk
and back to aws cp
.
Example
# url must be the entire prefix that includes folders.
# Ex.: url='s3://my-bucket-name/folderA/folderB',
# not url='s3://my-bucket-name'
url='s3://my-bucket-name/folderA/folderB'
prefix='RAW_TIMESTAMP_0506'
aws s3 ls "$url/$prefix" | awk '{system("aws s3 cp '"$url"'/"$4 " .")}'
Explanation
ls
part is pretty simple. I'm using variables to simplify and shorten the command. Always wrap shell variables in double quotes to prevent disaster.awk {print $4}
would extract only the filenames from the ls
output (NOT the S3 Key! This is why url must be the entire prefix that includes folders.)awk {system("echo " $4")}
would do the same thing, but it accomplishes this by calling another command. Note: I did NOT use a subshell $(...)
, because that would run the entire ls | awk
part before starting cp
. That would be slow, and it wouldn't print anything for a looong time.awk '{system("echo aws s3 cp "$4 " .")}'
would print commands that are very close to the ones we want. Pay attention to the spacing. If you try to run this, you'll notice something isn't quite right. This would produce commands like aws s3 cp RAW_TIMESTAMP_05060402_whatever.log .
awk '{system("echo aws s3 cp '$url'/"$4 " .")}'
is what we're looking for. This adds the path to the filename. Look closely at the quotes. Remember we wrapped the awk
parameter in single quotes, so we have to close and reopen the quotes if we want to use a shell variable in that parameter.awk '{system("aws s3 cp '"$url"'/"$4 " .")}'
is the final version. We just remove echo
to actually execute the commands created by awk
. Of course, I've also surrounded the $url
variable with double quotes, because it's good practice.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With