Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rails - Exporting a huge CSV file consumes all RAM in production

So my app exports a 11.5 MB CSV file and uses basically all of the RAM that never gets freed.

The data for the CSV is taken from the DB, and in the case mentioned above the whole thing is being exported.

I am using Ruby 2.4.1 standard CSV library in the following fashion:

export_helper.rb:

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  data = Model.scope1(param).scope2(param).includes(:model1, :model2)
  data.each do |item|
    file << [
      item.method1,
      item.method2,
      item.methid3
    ]
  end
  # repeat for other models - approx. 5 other similar loops
end

and then in the controller:

generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
  format.csv do
    send_file(
      "#{Rails.root}/full_report.csv",
      filename: 'full_report.csv',
      type: :csv,
      disposition: :attachment
    )
  end
end

After a single request the puma processes load 55% of the whole server's RAM and stay like that until eventually run out of memory completely.

For instance in this article generating a million-lines 75 MB CSV file required only 1 MB of RAM. But there is no DB querying involved.

The server has 1015 MB RAM + 400 MB of swap memory.

So my questions are:

  • What exactly consumes so much memory? Is it the CSV generation or the communication with the DB?
  • Am I doing something wrong and missing a memory leak? Or is it just how the library works?
  • Is there way to free up the memory without restarting puma workers?

Thanks in advance!

like image 507
mityakoval Avatar asked Dec 17 '22 20:12

mityakoval


1 Answers

Instead of each you should be using find_each, which is specifically for cases like this, because it will instantiate the Models in batches and release them afterwards, whereas each will instantiate all of them at once.

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  Model.scope1(param).find_each do |item|
    file << [
      item.method1
    ]
  end
end

Furthermore you should stream the CSV instead of writing it to memory or disk before sending it to the browser:

format.csv do
  headers["Content-Type"] = "text/csv"
  headers["Content-disposition"] = "attachment; filename=\"full_report.csv\""

  # streaming_headers
  # nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
  headers['X-Accel-Buffering'] = 'no'
  headers["Cache-Control"] ||= "no-cache"

  # Rack::ETag 2.2.x no longer respects 'Cache-Control'
  # https://github.com/rack/rack/commit/0371c69a0850e1b21448df96698e2926359f17fe#diff-1bc61e69628f29acd74010b83f44d041
  headers["Last-Modified"] = Time.current.httpdate

  headers.delete("Content-Length")
  response.status = 200

  header = ['Method 1', 'Method 2']
  csv_options = { col_sep: ";" }

  csv_enumerator = Enumerator.new do |y|
    y << CSV::Row.new(header, header).to_s(csv_options)
    Model.scope1(param).find_each do |item|
      y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
    end
  end

  # setting the body to an enumerator, rails will iterate this enumerator
  self.response_body = csv_enumerator
end
like image 112
eikes Avatar answered Apr 08 '23 11:04

eikes