Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby-on-Rails 3.2: Export a CSV with a large data set (100,000 records)

Introduction

I have a app that has multiple tables, some with and some without associations.

Some tables will be required to hold around 100,000 entries.

The app is using Rails 3.2 on Ruby 1.9 and hosted on Heroku. I have access to workers if needed.

The Requirement in Question

A vital requirement for the app is to allow users to export the data as a CSV - a requirement for this is to allow the user to filter what data they want to export but I am not worrying about that at the moment as you will see from the data below, I have hard-coded what data is to be exported but this does rule out creating a rake task to just export the whole table.

Also the method implemented must be factored to allow to be used by multiple tables to avoid unnessaccery code repeating.

Current Solution

I am implementing delayed_job in my app and performing a CSV generate in a job. While doing this I am following the solution found here at http://www.ayokasystems.com/blog/delegating-long-running-jobs-in-rails/ from 'abdullah'.

The idea being is to generate the data in CSV format and save it in a LONGTEXT field in the UserJobs table to allow the user to download once complete and at a future time.

The Problem

The method used from the above tutorial works fine in my app until I run a job of 100,000 records at once. To overcome this I have tried to add the cool find_each function to the perform method but delayed job worker reports back a error everytime it tries to process it:

[Worker(host:*** pid:18637)] ReportJob failed with NoMethodError: undefined method `each' for #<Title:0x007ff20c1ec1b0> - 0 failed attempts
[Worker(host:*** pid:18637)] ReportJob failed with NoMethodError: undefined method `each' for #<Title:0x007ff20ec47f18> - 1 failed attempts
[Worker(host:*** pid:18637)] 2 jobs processed at 10.5219 j/s, 2 failed ... 

My code for the perform method is:

def perform
  Title.find_each do |titles|
    csv_data = CSV.generate do |csv|
      titles.each do |t|
        csv << t.to_csv
      end
    end
    user_job = UserJob.find(user_job_id)
    user_job.update_attribute :data, csv_data
  end
end

Can anyone see what the issue could be, I am thinking I have just made a stupid mistake with how I am looping through things.

I am very open to any other suggestions to how to accomplish the requirement in question but please bear in mind the limitations I have with Heroku.

like image 482
Billy Avatar asked Sep 13 '25 13:09

Billy


2 Answers

You are trying to iterate with each but titles in this case are instances of title (not array).

csv_vals = []
columns = [:name, :release_date, :studio]

Title.find_each(:select => columns) do |title| 
  columns.each {|value| csv_vals << "#{title[value]}"}
end

# comma separated string 
csv_string = csv_vals.join(',')

There is more elegant way to formulate CSV string but I am too lazy to experiment.

What is important is that you are doing SELECT only on columns you need. For 100 000 records that adds to a lot of less bandwidth DB communication. With just find_each you get all columns for each row and you don't need them.

like image 155
Haris Krajina Avatar answered Sep 16 '25 06:09

Haris Krajina


find_each yields a single record to the block, not a collection, hence your error calling each on a single record. Take a look at find_in_batches, OR fix your code to use the single record:

Title.find_each do |title|
  CSV.generate do |csv|
    csv << title.to_csv
  end
  user_job = UserJob.find(user_job_id)
  user_job.update_attribute :data, csv_data
end
like image 27
Thilo Avatar answered Sep 16 '25 06:09

Thilo