Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS S3 Sync very slow when copying to large directories

When syncing data to an empty directory in S3 using AWS-CLI, it's almost instant. However, when syncing to a large directory (several million folders), it takes a very long time before even starting to upload / sync the files.

Is there an alternative method? It looks like it's trying to take account of all files in an S3 directory before syncing - I don't need that, and uploading the data without checking beforehand would be fine.

like image 476
King Dedede Avatar asked Jan 24 '17 18:01

King Dedede


People also ask

How fast is S3 sync?

After some preliminary tests with aws s3 sync we found we could get a max of about 150 megabytes/second throughput.

How will you upload a file greater than 100 megabytes in Amazon S3?

Instead of using the Amazon S3 console, try uploading the file using the AWS Command Line Interface (AWS CLI) or an AWS SDK. Note: If you use the Amazon S3 console, the maximum file size for uploads is 160 GB. To upload a file that is larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

What is the largest size file you can transfer to S3 using a single PUT operation?

Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.


2 Answers

The sync command will need to enumerate all of the files in the bucket to determine whether a local file already exists in the bucket and if it is the same as the local file. The more documents you have in the bucket, the longer it's going to take.

If you don't need this sync behavior just use a recursive copy command like:

aws s3 cp --recursive . s3://mybucket/

and this should copy all of the local files in the current directory to the bucket in S3.

like image 150
garnaat Avatar answered Sep 24 '22 08:09

garnaat


If you use the unofficial s3cmd from S3 Tools, you can use the --no-check-md5 option while using sync to disable the MD5 sums comparison to significantly speed up the process.

--no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.

Source: https://s3tools.org/usage

Example: s3cmd --no-check-md5 sync /directory/to/sync s3://mys3bucket/

like image 34
spoonsearch Avatar answered Sep 21 '22 08:09

spoonsearch