We are uploading various files to S3 via the Ruby AWS SDK (v2) from a Windows machine. We have tested with Ruby 1.9. Our code works fine except when large files are encountered, when an out of memory error is thrown.
At first we were reading the whole file into memory with this code:
:body => IO.binread(filepath),
Then after Googling we found that there were ways to read the file in chunks with Ruby:
:body => File.open(filepath, 'rb') { |io| io.read },
This code did not resolve the issue though, and we can't find a specific S3 (or related) example which shows how the file can be read and passed to S3 in chunks. The whole file is still loaded into memory and throws an out of memory error with large files.
We know we can split the file into chunks and upload to S3 using the AWS multi part upload, however the preference would be to avoid this if possible (although it's fine if it's the only way).
Our code sample is below. What is the best way to read the file in chunks, avoiding the out of memory errors, and upload to S3?
require 'aws-sdk'
filepath = 'c:\path\to\some\large\file.big'
bucket = 's3-bucket-name'
s3key = 'some/s3/key/file.big'
accesskeyid = 'ACCESSKEYID'
accesskey = 'ACCESSKEYHERE'
region = 'aws-region-here'
s3 = Aws::S3::Client.new(
:access_key_id => accesskeyid,
:secret_access_key => accesskey,
:region => region
)
resp = s3.put_object(
:bucket => bucket,
:key => s3key,
:body => File.open(filepath, 'rb') { |io| io.read },
)
Note that we are not hitting the S3 5GB limit, this is happening for files for example of 1.5GB.
Instead of using the Amazon S3 console, try uploading the file using the AWS Command Line Interface (AWS CLI) or an AWS SDK. Note: If you use the Amazon S3 console, the maximum file size for uploads is 160 GB. To upload a file that is larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.
Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.
The size of an object in S3 can be from a minimum of 0 bytes to a maximum of 5 terabytes, so, if you are looking to upload an object larger than 5 gigabytes, you need to use either multipart upload or split the file into logical chunks of up to 5GB and upload them manually as regular uploads.
The v2 AWS SDK for Ruby, aws-sdk
gem, supports streaming objects directly over over the network without loading them into memory. Your example requires only a small correction to do this:
File.open(filepath, 'rb') do |file|
resp = s3.put_object(
:bucket => bucket,
:key => s3key,
:body => file
)
end
This works because it allows the SDK to call #read
on the file object passing in a small number of bytes each time. Calling #read
on a Ruby IO object, such as a file, without a first argument will read the entire object into memory, returning it as a string. This is what has caused your out-of-memory errors.
That said, the aws-sdk
gem provides another, more useful interface for uploading files to Amazon S3. This alternative interface automatically:
A simple example:
# notice this uses Resource, not Client
s3 = Aws::S3::Resource.new(
:access_key_id => accesskeyid,
:secret_access_key => accesskey,
:region => region
)
s3.bucket(bucket).object(s3key).upload_file(filepath)
This is part of the aws-sdk
resource interfaces. There are quite a few helpful utilities in here. The Client class only provides basic API functionality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With