Uploading Large File to S3 with Ruby Fails with Out of Memory Error, How to Read and Upload in Chunks?

Tags:

We are uploading various files to S3 via the Ruby AWS SDK (v2) from a Windows machine. We have tested with Ruby 1.9. Our code works fine except when large files are encountered, when an out of memory error is thrown.

At first we were reading the whole file into memory with this code:

:body => IO.binread(filepath),

Then after Googling we found that there were ways to read the file in chunks with Ruby:

:body =>  File.open(filepath, 'rb') { |io| io.read },

This code did not resolve the issue though, and we can't find a specific S3 (or related) example which shows how the file can be read and passed to S3 in chunks. The whole file is still loaded into memory and throws an out of memory error with large files.

We know we can split the file into chunks and upload to S3 using the AWS multi part upload, however the preference would be to avoid this if possible (although it's fine if it's the only way).

Our code sample is below. What is the best way to read the file in chunks, avoiding the out of memory errors, and upload to S3?

require 'aws-sdk'

filepath = 'c:\path\to\some\large\file.big'
bucket = 's3-bucket-name'
s3key = 'some/s3/key/file.big'
accesskeyid = 'ACCESSKEYID'
accesskey = 'ACCESSKEYHERE'
region = 'aws-region-here'

s3 = Aws::S3::Client.new(
  :access_key_id => accesskeyid,
  :secret_access_key => accesskey,
  :region => region
  )

resp = s3.put_object(
  :bucket => bucket,
  :key => s3key,
  :body =>  File.open(filepath, 'rb') { |io| io.read },
  )

Note that we are not hitting the S3 5GB limit, this is happening for files for example of 1.5GB.

685

asked Mar 17 '15 16:03

jotap

1 Answers

The v2 AWS SDK for Ruby, aws-sdk gem, supports streaming objects directly over over the network without loading them into memory. Your example requires only a small correction to do this:

File.open(filepath, 'rb') do |file|
  resp = s3.put_object(
   :bucket => bucket,
   :key => s3key,
   :body => file
  )
end

This works because it allows the SDK to call #read on the file object passing in a small number of bytes each time. Calling #read on a Ruby IO object, such as a file, without a first argument will read the entire object into memory, returning it as a string. This is what has caused your out-of-memory errors.

That said, the aws-sdk gem provides another, more useful interface for uploading files to Amazon S3. This alternative interface automatically:

Uses multipart APIs for large objects
Can use multiple threads to upload parts in parallel, improving upload speed
Computes MD5s of data client-side to for service-side data integrity checks.

A simple example:

# notice this uses Resource, not Client
s3 = Aws::S3::Resource.new(
  :access_key_id => accesskeyid,
  :secret_access_key => accesskey,
  :region => region
)

s3.bucket(bucket).object(s3key).upload_file(filepath)

This is part of the aws-sdk resource interfaces. There are quite a few helpful utilities in here. The Client class only provides basic API functionality.

answered Oct 13 '22 01:10

Trevor Rowe

Related questions
                            
                                Redis EXECABORT Transaction discarded because of previous errors. (Redis::CommandError)
                            
                                Ruby naming convention / double underscore / useful stuff
                            
                                Rails has_many relationship without using id
                            
                                How to group and sum arrays in Ruby?
                            
                                Chef Ruby loop over attributes in an .erb template file
                            
                                Installing bootstrap sass with compass
                            
                                ERROR: Error installing jekyll: ERROR: Failed to build gem native extension
                            
                                Ruby Gem (EventMachine) Will Not Install Using the Bundler GEM
                            
                                Rails dropdown menu with required fields
                            
                                Is there an inline way to conditionally add an attribute in Ruby?
                            
                                Using AWS SQS with Ruby on Rails
                            
                                Why is lookup in an Array O(1)?
                            
                                Ruby Compass Compiler not working, error on line [54]
                            
                                Ruby slim - class for a div from variable
                            
                                The apt recipe won't install within my recipe
                            
                                Accessing included class's protected constant in a ActiveSupport::Concern
                            
                                How are Arrays Sets and SortedSets implemented under the hood in Ruby
                            
                                How can I ensure an operation runs before Rails exits, without using `at_exit`?
                            
                                New way of creating hashes in ruby 2.2.0
                            
                                How to install a gem globally without sudo using rbenv?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Uploading Large File to S3 with Ruby Fails with Out of Memory Error, How to Read and Upload in Chunks?

Tags:

ruby

file-upload

amazon-web-services

amazon-s3

aws-sdk

jotap

People also ask

1 Answers

Trevor Rowe

Recent Activity

Donate For Us