I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when generating them. Now, I have an issue, in that <code>Aws::S3</code> expects a file in order to be able to perform an upload, while in my Rails app I would like to do something like: <pre class="prettyprint"><code>S3.bucket('my-bucket').object('my-csv') << %w(this is one line) </code></pre> How can I achieve this?

<pre class="prettyprint"><code>s3 = Aws::S3::Resource.new(region:'us-west-2') obj = s3.bucket.object("#{FOLDER_NAME}/#{file_name}.csv") file_csv = CSV.generate do |csv| csv << ActionLog.column_names ActionLog.all.each do |action_log| csv << action_log.attributes.values end end obj.put body: file_csv </code></pre> <code>file_csv = CSV.generate</code> is to create a string of CSV data in Ruby. After creating this string of CSV, we put to S3 using bucket, with the path <pre class="prettyprint"><code>#{FOLDER_NAME}/#{file_name}.csv </code></pre> In my code, I export all the data to an <code>ActionLog</code> model.

Upload CSV stream from Ruby to S3

Tags:

ruby

csv

ruby-on-rails

heroku

amazon-s3

I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when generating them.

Now, I have an issue, in that Aws::S3 expects a file in order to be able to perform an upload, while in my Rails app I would like to do something like:

S3.bucket('my-bucket').object('my-csv') << %w(this is one line)

How can I achieve this?

713

asked Feb 11 '16 20:02

linkyndy

2 Answers

You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case. https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

client = Aws::S3::Client.new(
  region: 'ap-northeast-1',
  credentials: your_credential
)

obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)

require "csv"
obj.upload_stream do |write_stream|
  [
    %w(this is first line),
    %w(this is second line),
    %w(this is third line),
  ].each do |line|
    write_stream << line.to_csv
  end
end

this,is,first,line
this,is,second,line
this,is,third,line

The argument to the upload_stream block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:

obj.upload_stream do |write_stream|
  CSV(write_stream) do |csv|
    [
      %w(this is first line),
      %w(this is second line),
      %w(this is third line),
    ].each do |line|
      csv << line
    end
  end
end

Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:

obj.upload_stream(tempfile: true) do |write_stream|
  # When uploading compressed data, use binmode to avoid an encoding error.
  write_stream.binmode

  Zlib::GzipWriter.wrap(write_stream) do |gzw|
    CSV(gzw) do |csv|
      [
        %w(this is first line),
        %w(this is second line),
        %w(this is third line),
      ].each do |line|
        csv << line
      end
    end
  end
end

Edited: In the compressed example code, you have to add binmode to fix the following error:

Aws::S3::MultipartUploadError: multipart upload failed: "\x8D" from ASCII-8BIT to UTF-8

answered Oct 04 '22 07:10

Mitsutoshi Watanabe

s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket.object("#{FOLDER_NAME}/#{file_name}.csv")
file_csv = CSV.generate do |csv|
    csv << ActionLog.column_names
    ActionLog.all.each do |action_log|
      csv << action_log.attributes.values
    end
  end
  obj.put body: file_csv

file_csv = CSV.generate is to create a string of CSV data in Ruby. After creating this string of CSV, we put to S3 using bucket, with the path

#{FOLDER_NAME}/#{file_name}.csv

In my code, I export all the data to an ActionLog model.

answered Oct 04 '22 08:10

Duc Nguyen

Related questions
                            
                                Signup or Invitation Email Verification w/o Database
                            
                                Rounding problem with rspec tests when comparing float arrays
                            
                                Preventing "warning: toplevel constant B referenced by A::B" with namespaced classes in Rails
                            
                                RSpec should have_link fails despite link existence
                            
                                params.merge and cross site scripting
                            
                                ERROR - Could not load 'guard/rspec' or' ' find class Guard::Rspec
                            
                                web based ide for ruby on rails? [closed]
                            
                                how to change the rails logger to use standard out from rake tasks (rails2)
                            
                                Capybara: Filled-in forms and save_and_open_page
                            
                                Rspec testing for html entities in page content
                            
                                How do I write a UNION chain with ActiveRelation?
                            
                                Creating an initializer
                            
                                What is the difference between gem bootstrap-sass and gem twitter-bootstrap-rails [closed]
                            
                                Show SQL generated by pending migrations in rails without updating the database
                            
                                What's `rspec/autorun` for?
                            
                                Why can't my Capybara/Poltergeist test select from a jQuery autocomplete field?
                            
                                The bundle currently has rails locked at 4.0.4
                            
                                Activeadmin form select dropdown update
                            
                                How to "allow-from" more than one domain for "X-Frame-Options" in Rails 4 controller?
                            
                                How can I test ActionCable channels using RSpec?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With