Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient bulk update rails database

I'm trying to build a rake utility that will update my database every so often.

This is the code I have so far:

namespace :utils do

  # utils:update_ip
  # Downloads the file frim <url> to the temp folder then unzips it in <file_path>
  # Then updates the database.

  desc "Update ip-to-country database"
  task :update_ip => :environment do

    require 'open-uri'
    require 'zip/zipfilesystem'
    require 'csv'

    file_name = "ip-to-country.csv"
    file_path = "#{RAILS_ROOT}/db/" + file_name
    url = 'http://ip-to-country.webhosting.info/downloads/ip-to-country.csv.zip'


    #check last time we updated the database.
    mod_time = ''
    mod_time = File.new(file_path).mtime.httpdate    if File.exists? file_path

    begin
      puts 'Downloading update...'
      #send conditional GET to server
      zipped_file = open(url, {'If-Modified-Since' => mod_time})
    rescue OpenURI::HTTPError => the_error
      if the_error.io.status[0] == '304'
        puts 'Nothing to update.'
      else
        puts 'HTTPError: ' + the_error.message
      end
    else # file was downloaded without error.

      Rails.logger.info 'ip-to-coutry: Remote database was last updated: ' + zipped_file.meta['last-modified']
      delay = Time.now - zipped_file.last_modified
      Rails.logger.info "ip-to-country: Database was outdated for: #{delay} seconds (#{delay / 60 / 60 / 24 } days)"

      puts 'Unzipping...'
      File.delete(file_path) if File.exists? file_path
      Zip::ZipFile.open(zipped_file.path) do |zipfile|
        zipfile.extract(file_name, file_path)
      end

      Iptocs.delete_all

      puts "Importing new database..."


      # TODO: way, way too heavy find a better solution.


      CSV.open(file_path, 'r') do |row|
        ip = Iptocs.new(  :ip_from        => row.shift,
                        :ip_to          => row.shift,
                        :country_code2  => row.shift,
                        :country_code3  => row.shift,
                        :country_name   => row.shift)
        ip.save
      end #CSV
      puts "Complete."

    end #begin-resuce
  end #task
end #namespace

The problem I'm having is that this takes a few minutes to enter the 100 thousand plus entries. I'd like to find a more efficient way of updating my database. Ideally this will remain independent of the database type, but if not my production server will be running on MySQL.

Thank you for any insight.

like image 494
codr Avatar asked Feb 17 '10 22:02

codr


1 Answers

Have you tried to use AR Extensions for bulk import? You get impressive performance improvements when you are inserting 1000's of rows to DB. Visit their website for more details.

Refer to these examples for more information

Usage Example 1

Usage Example 2

Usage Example 3

like image 137
Harish Shetty Avatar answered Oct 14 '22 20:10

Harish Shetty