Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doing huge data migrations in rails

I am experiencing big time and computing power challenges when doing big data migration (several 100.000 rows). I am developing a service that handles a lot of data in rails. Our models are constantly changing as we get more and more cleaver about our design. This leads to a lot of migrations on our database which is a Postgres 9.0 database. Often these migrations also includes some kind of migration on the data itself. Yesterday we found out that we needed to move a 'text' attribute on a model into a separate model so that the attribute was no longer just an attribute on the model but a one to many relationship instead.

My migration looked somewhat like this:

def self.up
  create_table :car_descriptions do |t|
    t.integer :car_id
    t.text :description

    t.timestamps
  end

  Car.find_each do |car|
    if car.description.present?
      car.descriptions.build :description => car.description
    end
    car.save
  end
  remove_column :cars, :description
end

Now the problem is, that this is running pretty slow, and even worse, if I set a counter, and prints out the progress, I can see that the migration is running slower and slower over time. In my activity monitor I can see that the ruby process is taking up more and more memory.

So my question is - is there a better way to do big data migrations like this?

like image 905
Niels Kristian Avatar asked Jul 28 '11 15:07

Niels Kristian


1 Answers

You should not use ActiveRecord here for migrating the data from your Car model to CarDescription model. Instead you should retort to running raw sql(run from migration). In my last job, we had such problems with huge data, and running raw sqls resulted in much quicker migrations(although quick was still 5-6 hours sometimes). One other practice we developed over time after many bitter experiences was, we always copied our database from production to our staging server and ran the migration atleast twice on the staging. We always came up with some process(specific to migration) which was a huge time saver following this practice. Sometimes the process included, manually dropping some indexes, running the migration and manually creating those indexes again.

In current case, the sql can look something like this:

INSERT INTO car_descriptions(car_id, description) SELECT id, description FROM cars

Hope you find it useful, let me know if I can add something to the answer.

like image 149
rubish Avatar answered Nov 07 '22 08:11

rubish