Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby on Rails memory leak when looping through large number of records; find_each doesn't help

I have a Rails app that processes a large (millions) number of records in a mysql database. Once it starts working, its memory use quickly grows at a speed of 50MB per second. With tools like oink I was able to narrow down the root cause to one loop that goes through all the records in a big table in the database.

I understand if I use something like Person.all.each, all the records will be loaded into memory. However if I switch to find_each, I still see the same memory issue. To further isolate the problem I created the following test controller, which does nothing but looping through the records. I suppose find_each only keeps a small number of objects in memory each time, but memory use grows linearly as it executes.

class TestController < ApplicationController
  def memory_test
    Person.find_each do |person|
    end
end

I suspect it has to do with ActiveRecord caching the query results. But I checked my environment settings and I do have all the caching related options set to false in development (I am using the default settings created by rails). I did some search online but couldn't find a solution.

I am using rails 3.1.0 rc1 and ruby 1.9.2

Thanks!

like image 618
WYi Avatar asked Jul 12 '11 18:07

WYi


3 Answers

I was able to figure this out myself. There are two places to change.

First, disable IdentityMap. In config/application.rb

config.active_record.identity_map = false

Second, use uncached to wrap up the loop

class MemoryTestController < ApplicationController
  def go
    ActiveRecord::Base.uncached do
      Person.find_each do |person|
        # whatever operation
      end
    end
  end
end

Now my memory use is under control. Hope this helps other people.

like image 179
WYi Avatar answered Nov 05 '22 03:11

WYi


find_each calls find_in_batches with a batch size of 1000 under the hood.

All the records in the batch will be created and retained in memory as long as the batch is being processed.

If your records are large or if they consume a lot of memory via proxy collections (e.g. has_many caches all of its items anytime you use it), you can also try a smaller batch size:

  Person.find_each batch_size: 100 do |person|
    # whatever operation
  end

You can also try manually calling GC.start periodically (e.g. every 300 items)

like image 42
d4n3 Avatar answered Nov 05 '22 01:11

d4n3


As nice as ActiveRecord is, it is not the best tool for all problems. I recommend dropping down to your native database adapter and doing the work at that level.

like image 2
Jeremy Weathers Avatar answered Nov 05 '22 03:11

Jeremy Weathers