I am dealing with potentially huge CSV files which I want to export from my Rails app, and since it runs on Heroku, my idea was to stream these CSV files directly to S3 when generating them.
Now, I have an issue, in that Aws::S3
expects a file in order to be able to perform an upload, while in my Rails app I would like to do something like:
S3.bucket('my-bucket').object('my-csv') << %w(this is one line)
How can I achieve this?
A button to select and upload files to an Amazon S3 bucket. It supports file type restrictions, MIME types, ACL control, and custom or randomly generated file names.
You can use s3 multipart upload that allows upload by splitting large objects to multiple chunks. https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html
Multipart upload requires more complex coding but aws-sdk-ruby V3 supports upload_stream
method which seems to execute multipart upload internally and it's very easy to use. Maybe exact solution for this use case.
https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method
client = Aws::S3::Client.new(
region: 'ap-northeast-1',
credentials: your_credential
)
obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)
require "csv"
obj.upload_stream do |write_stream|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
write_stream << line.to_csv
end
end
this,is,first,line
this,is,second,line
this,is,third,line
The argument to the upload_stream
block can usually be used as an IO object, which allows you to chain and wrap CSV generation as you would for a file or other IO object:
obj.upload_stream do |write_stream|
CSV(write_stream) do |csv|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
csv << line
end
end
end
Or for example, you could compress the CSV while you generate and upload it, using a tempfile to reduce memory footprint:
obj.upload_stream(tempfile: true) do |write_stream|
# When uploading compressed data, use binmode to avoid an encoding error.
write_stream.binmode
Zlib::GzipWriter.wrap(write_stream) do |gzw|
CSV(gzw) do |csv|
[
%w(this is first line),
%w(this is second line),
%w(this is third line),
].each do |line|
csv << line
end
end
end
end
Edited: In the compressed example code, you have to add binmode
to fix the following error:
Aws::S3::MultipartUploadError: multipart upload failed: "\x8D" from ASCII-8BIT to UTF-8
s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket.object("#{FOLDER_NAME}/#{file_name}.csv")
file_csv = CSV.generate do |csv|
csv << ActionLog.column_names
ActionLog.all.each do |action_log|
csv << action_log.attributes.values
end
end
obj.put body: file_csv
file_csv = CSV.generate
is to create a string of CSV data in Ruby. After creating this string of CSV, we put to S3 using bucket, with the path
#{FOLDER_NAME}/#{file_name}.csv
In my code, I export all the data to an ActionLog
model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With