Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I optimize an ActiveRecord find_in_batches query?

I'm using Rails 4.0.0 and Ruby 2.0.0. My Post (as in blog posts) model is associated with a user with a combination of the user's user_name, first_name, last_name. I'd like to migrate the data so that posts are associated to users by a foreign key, which is the user's id.

I have about 11 million records in the posts table.

I'm running the below code to migrate the data, using a rake task on a Linux server. However, my task keeps getting "Killed" by the sever, presumably due to the rake task, specifically the below code, consuming too much memory.

I've found that lowering the batch_size to 20 and increasing sleep(10) to sleep(60) allows the task to run longer, updating more records in total without being Killed, but takes significantly more time.

How can I optimize this code for speed and memory usage?

Post.where(user_id: nil).find_in_batches(batch_size: 1000) do |posts|
  puts "*** Updating batch beginning with post #{posts.first.id}..."
  sleep(10) # Hopefully, saving some memory usage.
  posts.each do |post|
    begin
      user = User.find_by(user_name: post.user_name, first_name: post.first_name, last_name: post.last_name)
      post.update(user_id: user.id)
    rescue NoMethodError => error # user could be nil, so user.id will raise a NoMethodError
      puts "No user found."
    end
  end
  puts "*** Finished batch."
end
like image 715
sealocal Avatar asked Apr 14 '16 20:04

sealocal


People also ask

What is ActiveRecord?

Active Record allows you to validate the state of a model before it gets written into the database. There are several methods that you can use to check your models and validate that an attribute value is not empty, is unique and not already in the database, follows a specific format, and many more.

What does ActiveRecord where return?

Returns a new relation, which is the result of filtering the current relation according to the conditions in the arguments.


1 Answers

Do all the work in the database which is WAY faster than moving data back and forth.

This can be accomplished with ActiveRecord. Of course PLEASE test this before you unleash it on important data.

Post
  .where(user_id: nil)
  .joins("inner join users on posts.user_name = users.user_name")
  .update_all("posts.user_id = users.id")

Further, if posts have an index on user_id, and users has an index on user_name, then that will help this particular query run more quickly.

like image 88
Mark Bolusmjak Avatar answered Nov 04 '22 23:11

Mark Bolusmjak