When syncing data to an empty directory in S3 using AWS-CLI, it's almost instant. However, when syncing to a large directory (several million folders), it takes a very long time before even starting to upload / sync the files. Is there an alternative method? It looks like it's trying to take account of all files in an S3 directory before syncing - I don't need that, and uploading the data without checking beforehand would be fine.

The <code>sync</code> command will need to enumerate all of the files in the bucket to determine whether a local file already exists in the bucket and if it is the same as the local file. The more documents you have in the bucket, the longer it's going to take. If you don't need this <code>sync</code> behavior just use a recursive copy command like: <pre class="prettyprint"><code>aws s3 cp --recursive . s3://mybucket/ </code></pre> and this should copy all of the local files in the current directory to the bucket in S3.

AWS S3 Sync very slow when copying to large directories

Tags:

amazon-web-services

amazon-s3

aws-cli

bigdata

When syncing data to an empty directory in S3 using AWS-CLI, it's almost instant. However, when syncing to a large directory (several million folders), it takes a very long time before even starting to upload / sync the files.

Is there an alternative method? It looks like it's trying to take account of all files in an S3 directory before syncing - I don't need that, and uploading the data without checking beforehand would be fine.

476

asked Jan 24 '17 18:01

King Dedede

2 Answers

The sync command will need to enumerate all of the files in the bucket to determine whether a local file already exists in the bucket and if it is the same as the local file. The more documents you have in the bucket, the longer it's going to take.

If you don't need this sync behavior just use a recursive copy command like:

aws s3 cp --recursive . s3://mybucket/

and this should copy all of the local files in the current directory to the bucket in S3.

150

answered Sep 24 '22 08:09

garnaat

If you use the unofficial s3cmd from S3 Tools, you can use the --no-check-md5 option while using sync to disable the MD5 sums comparison to significantly speed up the process.

--no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.

Source: https://s3tools.org/usage

Example: s3cmd --no-check-md5 sync /directory/to/sync s3://mys3bucket/

answered Sep 21 '22 08:09

spoonsearch

Related questions
                            
                                How to tell cloudfront to not cache 302 responses from S3 redirects, or, how else to workaround this image caching generation issue
                            
                                How to use Boto3 pagination
                            
                                Cognito logout does not work as documented
                            
                                RDS endpoint name format
                            
                                AWS ssh access 'port 22: Operation timed out' issue
                            
                                Stop and Start Elastic Beanstalk Services
                            
                                Which user launched EC2 instance?
                            
                                What are the default security groups created when I set up AWS EB for the first time?
                            
                                How do I use Boto3 to launch an EC2 instance with an IAM role?
                            
                                Why can't the pg gem be installed in AWS Elastic Beanstalk?
                            
                                AWS Application Load Balancer transforms all headers to lower case
                            
                                In Mechanical Turk, how do you limit to one HIT per worker
                            
                                AWS - How my single EC2 micro instance can cross 750 hour limit?
                            
                                API retry logic in Amazon Web Services
                            
                                Trying to pass parameters from Master to child template
                            
                                AWS push notification service integration error
                            
                                where can I find the secret key for the JWT from cognito
                            
                                Enable AWS S3 MFA delete with the console
                            
                                Upload an image from Android to Amazon S3?
                            
                                How to check if Python app is running within AWS lambda function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With