Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: upload large files S3 fast

I am trying to upload programmatically an very large file up to 1GB on S3. As I found that AWS S3 supports multipart upload for large files, and I found some Python code to do it. (link )

My point: the speed of upload was too slow (almost 1 min).

Is there any way to increase the performance of multipart upload. Or any good library support S3 uploading

like image 208
Phong Vu Avatar asked Apr 30 '18 17:04

Phong Vu


People also ask

How can I upload files larger than 5gb to S3?

Note: If you use the Amazon S3 console, the maximum file size for uploads is 160 GB. To upload a file that is larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

What is the best way for the application to upload the large files in S3?

When you upload large files to Amazon S3, it's a best practice to leverage multipart uploads. If you're using the AWS Command Line Interface (AWS CLI), then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.

Can we upload 6tb file to S3?

The largest single file that can be uploaded into an Amazon S3 Bucket in a single PUT operation is 5 GB. If you want to upload large objects (> 5 GB), you will consider using multipart upload API, which allows to upload objects from 5 MB up to 5 TB.


2 Answers

Leave my answer here for ref, the performance increase twice with this code:

import boto3
from boto3.s3.transfer import TransferConfig


s3_client = boto3.client('s3')

S3_BUCKET = 'mybucket'
FILE_PATH = '/path/to/file/'
KEY_PATH = "/path/to/s3key/" 

def uploadFileS3(filename):
    config = TransferConfig(multipart_threshold=1024*25, max_concurrency=10,
                        multipart_chunksize=1024*25, use_threads=True)
    file = FILE_PATH + filename
    key = KEY_PATH + filename
    s3_client.upload_file(file, S3_BUCKET, key,
    ExtraArgs={ 'ACL': 'public-read', 'ContentType': 'video/mp4'},
    Config = config,
    Callback=ProgressPercentage(file)
    )

uploadFileS3('upload.mp4')

Special thank to @BryceH for suggestion. Although solution did increase the performance of S3 uploading, but I still open to receive any better solution. Thanks

like image 168
Phong Vu Avatar answered Oct 22 '22 06:10

Phong Vu


1 minute for 1 GB is quite fast for that much data over the internet. You should consider S3 transfer acceleration for this use case. https://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html

like image 31
BryceH Avatar answered Oct 22 '22 06:10

BryceH