Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby on Rails - Import Data from a CSV file

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

Simpler version of yfeldblum's answer, that is simpler and works well also with large files:

require 'csv'    

CSV.foreach(filename, headers: true) do |row|
  Moulding.create!(row.to_hash)
end

No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.

It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.


The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.

  require 'smarter_csv'
  options = {}
  SmarterCSV.process('input_file.csv', options) do |chunk|
    chunk.each do |data_hash|
      Moulding.create!( data_hash )
    end
  end

You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.

See also: https://github.com/tilo/smarter_csv


You might try Upsert:

require 'upsert' # add this to your Gemfile
require 'csv'    

u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
  selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
  setter = row.to_hash
  u.row selector, setter
end

If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.


This can help. It has code examples too:

http://csv-mapper.rubyforge.org/

Or for a rake task for doing the same:

http://erikonrails.snowedin.net/?p=212


It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,

require 'csv'

namespace :lan do
  desc 'Seed initial languages data with language & code'
  task init_data: :environment do
    puts '>>> Initializing Languages Data Table'
    ActiveRecord::Base.transaction do
      csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
      csv_str = File.read(csv_path)
      csv = CSV.new(csv_str).to_a
      csv.each do |lan_set|
        lan_code = lan_set[0]
        lan_str = lan_set[1]
        Language.create!(language: lan_str, code: lan_code)
        print '.'
      end
    end
    puts ''
    puts '>>> Languages Database Table Initialization Completed'
  end
end

Snippet below is a partial of languages.csv file,

aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...