Can I stream a file upload to S3 without a content-length header?

Tags:

I'm working on a machine with limited memory, and I'd like to upload a dynamically generated (not-from-disk) file in a streaming manner to S3. In other words, I don't know the file size when I start the upload, but I'll know it by the end. Normally a PUT request has a Content-Length header, but perhaps there is a way around this, such as using multipart or chunked content-type.

S3 can support streaming uploads. For example, see here:

http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/

My question is, can I accomplish the same thing without having to specify the file length at the start of the upload?

508

asked Dec 28 '11 07:12

Tyler

2 Answers

You have to upload your file in 5MiB+ chunks via S3's multipart API. Each of those chunks requires a Content-Length but you can avoid loading huge amounts of data (100MiB+) into memory.

Initiate S3 Multipart Upload.
Gather data into a buffer until that buffer reaches S3's lower chunk-size limit (5MiB). Generate MD5 checksum while building up the buffer.
Upload that buffer as a Part, store the ETag (read the docs on that one).
Once you reach EOF of your data, upload the last chunk (which can be smaller than 5MiB).
Finalize the Multipart Upload.

S3 allows up to 10,000 parts. So by choosing a part-size of 5MiB you will be able to upload dynamic files of up to 50GiB. Should be enough for most use-cases.

However: If you need more, you have to increase your part-size. Either by using a higher part-size (10MiB for example) or by increasing it during the upload.

First 25 parts:   5MiB (total:  125MiB) Next 25 parts:   10MiB (total:  375MiB) Next 25 parts:   25MiB (total:    1GiB) Next 25 parts:   50MiB (total: 2.25GiB) After that:     100MiB

This will allow you to upload files of up to 1TB (S3's limit for a single file is 5TB right now) without wasting memory unnecessarily.

A note on your link to Sean O'Donnells blog:

His problem is different from yours - he knows and uses the Content-Length before the upload. He wants to improve on this situation: Many libraries handle uploads by loading all data from a file into memory. In pseudo-code that would be something like this:

data = File.read(file_name) request = new S3::PutFileRequest() request.setHeader('Content-Length', data.size) request.setBody(data) request.send()

His solution does it by getting the Content-Length via the filesystem-API. He then streams the data from disk into the request-stream. In pseudo-code:

upload = new S3::PutFileRequestStream() upload.writeHeader('Content-Length', File.getSize(file_name)) upload.flushHeader()  input = File.open(file_name, File::READONLY_FLAG)  while (data = input.read())   input.write(data) end  upload.flush() upload.close()

answered Sep 23 '22 11:09

Marcel Jackwerth

Putting this answer here for others in case it helps:

If you don't know the length of the data you are streaming up to S3, you can use S3FileInfo and its OpenWrite() method to write arbitrary data into S3.

var fileInfo = new S3FileInfo(amazonS3Client, "MyBucket", "streamed-file.txt");  using (var outputStream = fileInfo.OpenWrite()) {     using (var streamWriter = new StreamWriter(outputStream))     {         streamWriter.WriteLine("Hello world");         // You can do as many writes as you want here     } }

answered Sep 26 '22 11:09

mwrichardson

Related questions
                            
                                How Can I Make the Go HTTP Client NOT Follow Redirects Automatically?
                            
                                Django: using <select multiple> and POST
                            
                                How to implement retry mechanism into python requests library?
                            
                                Load Blade assets with https in Laravel
                            
                                How do I use the Simple HTTP client in Android? [closed]
                            
                                how does http proxy work?
                            
                                RFC 1123 Date Representation in Python?
                            
                                Angular2: convert array to Observable
                            
                                Graphical HTTP client for windows [closed]
                            
                                Test file upload using HTTP PUT method
                            
                                Https to http redirect using htaccess
                            
                                How to expose a validation API in a RESTful way?
                            
                                Simple and concise HTTP client library for Scala
                            
                                How WebSocket server handles multiple incoming connection requests?
                            
                                NSURLSession: How to increase time out for URL requests?
                            
                                Checking if a website is up via Python
                            
                                Difference between Content-Range and Range headers?
                            
                                HttpWebRequest is extremely slow!
                            
                                Authorization header missing in PHP POST request
                            
                                Where can I find the default timeout settings for all browsers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I stream a file upload to S3 without a content-length header?

Tags:

rest

http

soap

stream

amazon-s3