Requirement is like
We get the huge dataset from the database( > 1 billion records) and need to export it to the csv file or excel.
Currently implementation use CSV class CSV.generate
CSV.generate(headers: true) do |csv|
csv << header
@obj.find_each do |c|
arr = [c.id,c.name,soon]
csv << array
end
end
and sending the output to
Zip::File.open(file, Zip::File::CREATE) do |zip|
zip.get_output_stream("test.#{@format}") { |f| f.puts(convert_to_csv) }
end
All this operation is done other delayed jobs This works good when record is < 20,000 But when rows starts growing it gets some memory issues.
What i was thinking is to chunk the record to pieces say 1 million rows into 50 files (1million/20000)(csv1.csv,csv2.csv,csv3.csv,csv4.csv,csv5.csv) and then concat them into single file or zip all files together(faster way)
Can any one give me idea how can I start on it.
Taking a look at the source for CSV.generate gives me the impression that the csv data is kept in memory while the contents are being accumulated. That seems like a good target for optimization, especially if you see that memory is scaling linearly with the data set. Since your data is pretty simple, could you skip CSV and go directly to File instead? You'd have a bit more control about when data was flushed out to disk.
File.open("my.csv") do |file|
file.puts '"ID","Name","Soon"'
@obj.find_each do |c|
file.puts "\"#{c.id}\",\"#{c.name}\",\"#{c.soon}\""
# flush if necessary
end
end
You'd need to write to disk and then zip the results later with this approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With