I am trying to copy data from s3 to local with prefix using aws-cli. But I am getting error with different regex. <pre class="prettyprint"><code>aws s3 cp s3://my-bucket-name/RAW_TIMESTAMP_0506* . --profile prod </code></pre> error: <blockquote> no matches found: s3://my-bucket-name/RAW_TIMESTAMP_0506* </blockquote>

<pre class="prettyprint"><code>aws s3 cp s3://my-bucket/ <local directory path> --recursive --exclude "*" --include "<prefix>*" </code></pre> This will copy only files with given prefix

The above answers to not work properly... for example I have many thousands of files in a directory by date, and I wish to retrieve only the files that are needed.. so I tried the correct version per the documents: <code>aws s3 cp s3://mybucket/sub /my/local/ --recursive --exclude "*" --include "20170906*.png"</code> and it did not download the prefixed files, but began to download everything so then I tried the sample above: <code>aws s3 cp s3://mybucket/sub/ . /my/local --recursive --include "20170906*"</code> and it also downloaded everything... It seems that this is an ongoing issue with aws cli, and they have no intention to fix it... Here are some workarounds that I found while Googling, but they are less than ideal. https://github.com/aws/aws-cli/issues/1454

The <code>aws s3 cp</code> command will not accept a wildcard as part of the filename (key). Instead, you must use the <code>--include</code> and <code>--exclude</code> parameters to define filenames. From: Use of Exclude and Include Filters <blockquote> Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have <code>--exclude "<value>"</code> and <code>--include "<value>"</code> parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported. </blockquote> So, you would use something like: <pre class="prettyprint"><code>aws s3 cp s3://my-bucket-name/ . --include "RAW_TIMESTAMP_0506*" </code></pre>

If you don't like silent consoles, you can pipe <code>aws ls</code> thru <code>awk</code> and back to <code>aws cp</code>. Example <pre class="prettyprint"><code># url must be the entire prefix that includes folders. # Ex.: url='s3://my-bucket-name/folderA/folderB', # not url='s3://my-bucket-name' url='s3://my-bucket-name/folderA/folderB' prefix='RAW_TIMESTAMP_0506' aws s3 ls "$url/$prefix" | awk '{system("aws s3 cp '"$url"'/"$4 " .")}' </code></pre> Explanation <ul> <li>The <code>ls</code> part is pretty simple. I'm using variables to simplify and shorten the command. Always wrap shell variables in double quotes to prevent disaster.</li> <li> <code>awk {print $4}</code> would extract only the filenames from the <code>ls</code> output (NOT the S3 Key! This is why url must be the entire prefix that includes folders.)</li> <li> <code>awk {system("echo " $4")}</code> would do the same thing, but it accomplishes this by calling another command. Note: I did NOT use a subshell <code>$(...)</code>, because that would run the entire <code>ls | awk</code> part before starting <code>cp</code>. That would be slow, and it wouldn't print anything for a looong time.</li> <li> <code>awk '{system("echo aws s3 cp "$4 " .")}'</code> would print commands that are very close to the ones we want. Pay attention to the spacing. If you try to run this, you'll notice something isn't quite right. This would produce commands like <code>aws s3 cp RAW_TIMESTAMP_05060402_whatever.log .</code> </li> <li> <code>awk '{system("echo aws s3 cp '$url'/"$4 " .")}'</code> is what we're looking for. This adds the path to the filename. Look closely at the quotes. Remember we wrapped the <code>awk</code> parameter in single quotes, so we have to close and reopen the quotes if we want to use a shell variable in that parameter.</li> <li> <code>awk '{system("aws s3 cp '"$url"'/"$4 " .")}'</code> is the final version. We just remove <code>echo</code> to actually execute the commands created by <code>awk</code>. Of course, I've also surrounded the <code>$url</code> variable with double quotes, because it's good practice.</li> </ul>

copy data from s3 to local with prefix

Tags:

amazon-web-services

amazon-s3

aws-cli

I am trying to copy data from s3 to local with prefix using aws-cli.

But I am getting error with different regex.

aws s3 cp s3://my-bucket-name/RAW_TIMESTAMP_0506* . --profile prod

error:

no matches found: s3://my-bucket-name/RAW_TIMESTAMP_0506*

606

asked Jun 16 '17 07:06

Bhavesh

4 Answers

aws s3 cp s3://my-bucket/ <local directory path> --recursive --exclude "*" --include "<prefix>*"

This will copy only files with given prefix

108

answered Oct 24 '22 10:10

Eyshika

The above answers to not work properly... for example I have many thousands of files in a directory by date, and I wish to retrieve only the files that are needed.. so I tried the correct version per the documents:

aws s3 cp s3://mybucket/sub /my/local/ --recursive --exclude "*" --include "20170906*.png"

and it did not download the prefixed files, but began to download everything

so then I tried the sample above:

aws s3 cp s3://mybucket/sub/ . /my/local --recursive --include "20170906*"

and it also downloaded everything... It seems that this is an ongoing issue with aws cli, and they have no intention to fix it... Here are some workarounds that I found while Googling, but they are less than ideal.

https://github.com/aws/aws-cli/issues/1454

answered Oct 24 '22 11:10

Patrick Francis

The aws s3 cp command will not accept a wildcard as part of the filename (key). Instead, you must use the --include and --exclude parameters to define filenames.

From: Use of Exclude and Include Filters

Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have --exclude "<value>" and --include "<value>" parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported.

So, you would use something like:

aws s3 cp s3://my-bucket-name/ . --include "RAW_TIMESTAMP_0506*"

answered Oct 24 '22 09:10

John Rotenstein

If you don't like silent consoles, you can pipe aws ls thru awk and back to aws cp.

Example

# url must be the entire prefix that includes folders.
# Ex.: url='s3://my-bucket-name/folderA/folderB',
# not url='s3://my-bucket-name'
url='s3://my-bucket-name/folderA/folderB'
prefix='RAW_TIMESTAMP_0506'
aws s3 ls "$url/$prefix" | awk '{system("aws s3 cp '"$url"'/"$4 " .")}'

Explanation

The ls part is pretty simple. I'm using variables to simplify and shorten the command. Always wrap shell variables in double quotes to prevent disaster.
awk {print $4} would extract only the filenames from the ls output (NOT the S3 Key! This is why url must be the entire prefix that includes folders.)
awk {system("echo " $4")} would do the same thing, but it accomplishes this by calling another command. Note: I did NOT use a subshell $(...), because that would run the entire ls | awk part before starting cp. That would be slow, and it wouldn't print anything for a looong time.
awk '{system("echo aws s3 cp "$4 " .")}' would print commands that are very close to the ones we want. Pay attention to the spacing. If you try to run this, you'll notice something isn't quite right. This would produce commands like aws s3 cp RAW_TIMESTAMP_05060402_whatever.log .
awk '{system("echo aws s3 cp '$url'/"$4 " .")}' is what we're looking for. This adds the path to the filename. Look closely at the quotes. Remember we wrapped the awk parameter in single quotes, so we have to close and reopen the quotes if we want to use a shell variable in that parameter.
awk '{system("aws s3 cp '"$url"'/"$4 " .")}' is the final version. We just remove echo to actually execute the commands created by awk. Of course, I've also surrounded the $url variable with double quotes, because it's good practice.

answered Oct 24 '22 09:10

musicin3d

Related questions
                            
                                Get object from AWS S3 as a stream
                            
                                Forgot password link from aws cognito
                            
                                Cannot run `source` in AWS Codebuild
                            
                                Quickly finding the size of an S3 'folder'
                            
                                AWS - EBS Attached But Can't Find On Instance
                            
                                Status Code 403: SignatureDoesNotMatch when I am using Amazon SES
                            
                                No such file or directory(public/assets/manifest*)
                            
                                Amazon SNS: "Platform credentials are invalid" when re-entering a GCM API key that previously worked
                            
                                PHP AWS SDK throwing unknown error
                            
                                Static content for AWS EC2 with IAM role
                            
                                How do I setup and use Laravel Scheduling on AWS Elastic Beanstalk?
                            
                                How can I get query strings in my Amazon S3 static website?
                            
                                AWS Elastic Beanstalk: Running Cron.d script, missing Environment Variables
                            
                                AWS Code Deploy Error on Before Install Cannot Solve
                            
                                AWS Glue takes a long time to finish
                            
                                How to upgrade ruby version in Amazon Linux system?
                            
                                Merging files on AWS S3 (Using Apache Camel)
                            
                                How are consumed read capacity units calculated in DynamoDB query
                            
                                How to convert PEM file to PPK using PuTTYgen in Ubuntu [closed]
                            
                                Amazon S3 triggering another a Lambda function in another account

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With