Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream a multi-GB file to AWS S3 from ASP.NET Core Web API

I wish to create a large (multi-GB) file in an AWS S3 bucket from an ASP.NET Core Web API. The file is sufficiently large that I wish not to load the Stream into memory prior to uploading it to AWS S3.

Using PutObjectAsync() I'm forced to pre-populate the Stream prior to passing it on to the AWS SDK, illustrated below:

var putObjectRequest = new PutObjectRequest
{
    BucketName = "my-s3-bucket",
    Key = "my-file-name.txt",
    InputStream = stream
};
var putObjectResponse = await amazonS3Client.PutObjectAsync(putObjectRequest);

My ideal pattern would involve the AWS SDK returning a StreamWriter (of sorts) I could Write() to many times and then Finalise() when I'm done.

Two questions concerning my challenge:

  • Am I misinformed about having to pre-populate the Stream prior to calling on PutObjectAsync()?
  • How should I go about uploading my large (multi-GB) file?
like image 254
Frank Avatar asked Jul 23 '17 11:07

Frank


People also ask

How can I upload files larger than 5gb to S3?

Note: If you use the Amazon S3 console, the maximum file size for uploads is 160 GB. To upload a file that is larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

What is the largest size file you can transfer to S3?

Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.


1 Answers

For such situations AWS docs provides two options:

  • Using the AWS .NET SDK for Multipart Upload (High-Level API)
  • Using the AWS .NET SDK for Multipart Upload (Low-Level API)

High-level API simply suggests you to create a TransferUtilityUploadRequest with a PartSize specified, so the class itself could upload the file without any need to maintain the upload by yourself. In this case you can get the progress on the multipart upload with subscribing to StreamTransferProgress event. You can upload a file, a stream, or a directory.

Low-level API, obviously, is more complicated, but more flexible - you can initiate the upload, and after that you do upload the next part of a file in a loop. Sample code from documentation:

var s3Client = new AmazonS3Client(Amazon.RegionEndpoint.USEast1);

// List to store upload part responses.
var uploadResponses = new List<UploadPartResponse>();

// 1. Initialize.
var initiateRequest = new InitiateMultipartUploadRequest
    {
        BucketName = existingBucketName,
        Key = keyName
    };

var initResponse = s3Client.InitiateMultipartUpload(initRequest);

// 2. Upload Parts.
var contentLength = new FileInfo(filePath).Length;
var partSize = 5242880; // 5 MB

try
{
    long filePosition = 0;
    for (var i = 1; filePosition < contentLength; ++i)
    {
        // Create request to upload a part.
        var uploadRequest = new UploadPartRequest
            {
                BucketName = existingBucketName,
                Key = keyName,
                UploadId = initResponse.UploadId,
                PartNumber = i,
                PartSize = partSize,
                FilePosition = filePosition,
                FilePath = filePath
            };

       // Upload part and add response to our list.
       uploadResponses.Add(s3Client.UploadPart(uploadRequest));

       filePosition += partSize;
   }

   // Step 3: complete.
   var completeRequest = new CompleteMultipartUploadRequest
       {
           BucketName = existingBucketName,
           Key = keyName,
           UploadId = initResponse.UploadId,
        };

   // add ETags for uploaded files
   completeRequest.AddPartETags(uploadResponses);

   var completeUploadResponse = s3Client.CompleteMultipartUpload(completeRequest);     
}
catch (Exception exception)
{
    Console.WriteLine("Exception occurred: {0}", exception.ToString());
    var abortMPURequest = new AbortMultipartUploadRequest
        {
            BucketName = existingBucketName,
            Key = keyName,
            UploadId = initResponse.UploadId
        };
    s3Client.AbortMultipartUpload(abortMPURequest);
}

Asynchronous version of UploadPart is available too, so you should investigate that path, if you need a full control for your uploads.

like image 96
VMAtm Avatar answered Oct 01 '22 23:10

VMAtm