Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating CSV/Excel file for 100 Million of records in Ruby on rails?

Requirement is like

We get the huge dataset from the database( > 1 billion records) and need to export it to the csv file or excel.

Currently implementation use CSV class CSV.generate

 CSV.generate(headers: true) do |csv|
    csv << header
    @obj.find_each do |c|
     arr = [c.id,c.name,soon]
     csv << array
    end
 end

and sending the output to

Zip::File.open(file, Zip::File::CREATE) do |zip|
        zip.get_output_stream("test.#{@format}") { |f| f.puts(convert_to_csv) }
      end

All this operation is done other delayed jobs This works good when record is < 20,000 But when rows starts growing it gets some memory issues.

What i was thinking is to chunk the record to pieces say 1 million rows into 50 files (1million/20000)(csv1.csv,csv2.csv,csv3.csv,csv4.csv,csv5.csv) and then concat them into single file or zip all files together(faster way)

Can any one give me idea how can I start on it.

like image 808
Kunal Vashist Avatar asked May 29 '19 13:05

Kunal Vashist


1 Answers

Taking a look at the source for CSV.generate gives me the impression that the csv data is kept in memory while the contents are being accumulated. That seems like a good target for optimization, especially if you see that memory is scaling linearly with the data set. Since your data is pretty simple, could you skip CSV and go directly to File instead? You'd have a bit more control about when data was flushed out to disk.

File.open("my.csv") do |file|
  file.puts '"ID","Name","Soon"'
  @obj.find_each do |c|
    file.puts "\"#{c.id}\",\"#{c.name}\",\"#{c.soon}\""
    # flush if necessary
  end
end

You'd need to write to disk and then zip the results later with this approach.

like image 102
AndyV Avatar answered Nov 15 '22 08:11

AndyV