I'm trying to read a large amount of cells from database (over 100.000) and write them to a csv file on VPS Ubuntu server. It happens that server doesn't have enough memory.
I was thinking about reading 5000 rows at once and writing them to file, then reading another 5000, etc..
How should I restructure my current code so that memory won't be consumed fully?
Here's my code:
def write_rows(emails)
File.open(file_path, "w+") do |f|
f << "email,name,ip,created\n"
emails.each do |l|
f << [l.email, l.name, l.ip, l.created_at].join(",") + "\n"
end
end
end
The function is called from sidekiq worker by:
write_rows(user.emails)
Thanks for help!
The problem here is that when you call emails.each
ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each
:
require 'csv'
BATCH_SIZE = 5000
def write_rows(emails)
CSV.open(file_path, 'w') do |csv|
csv << %w{email name ip created}
emails.find_each do |email|
csv << [email.email, email.name, email.ip, email.created_at]
end
end
end
By default find_each
loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size
to find_each
:
emails.find_each(:batch_size => 5000) do |email|
...
More information about the find_each
method (and the related find_in_batches
) can be found on the Ruby on Rails Guides.
I've used the CSV
class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With