Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ActiveRecord objects in hashes aren't garbage collected -- a bug or a sort of caching feature?

I have a simple ActiveRecord model called Student with 100 records in the table. I do the following in a rails console session:

ObjectSpace.each_object(ActiveRecord::Base).count # => 0  x = Student.all  ObjectSpace.each_object(ActiveRecord::Base).count # => 100  x = nil GC.start  ObjectSpace.each_object(ActiveRecord::Base).count # => 0     # Good! 

Now I do the following:

ObjectSpace.each_object(ActiveRecord::Base).count # => 0  x = Student.all.group_by(&:last_name)  ObjectSpace.each_object(ActiveRecord::Base).count # => 100  x = nil GC.start  ObjectSpace.each_object(ActiveRecord::Base).count # => 100     # Bad! 

Can anyone explain why this happens and whether there is a smart way to solve this without knowing the underlying hash structure? I know I can do this:

x.keys.each{|k| x[k]=nil} x = nil GC.start 

and it will remove all Student objects from memory correctly, but I'm wondering if there is a general solution (my real-life problem is wide spread and has more intricate data structures than the hash shown above).

I'm using Ruby 1.9.3-p0 and Rails 3.1.0.

UPDATE (SOLVED)

Per Oscar Del Ben's explanation below, a few ActiveRecord::Relation objects are created in the problematic code snippet (they are actually created in both code snippets, but for some reason they "misbehave" only in the second one. Can someone shed light on why?). These maintain references to the ActiveRecord objects via an instance variable called @records. This instance variable can be set to nil through the "reset" method on ActiveRecord::Relation. You have to make sure to perform this on all the relation objects:

ObjectSpace.each_object(ActiveRecord::Base).count # => 100  ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)  GC.start ObjectSpace.each_object(ActiveRecord::Base).count # => 0 

Note: You can also use Mass.detach (using the ruby-mass gem Oscar Del Ben referenced), though it will be much slower than the code above. Note that the code above does not remove a few ActiveRecord::Relation objects from memory. These seem to be pretty insignificant though. You can try doing:

Mass.index(ActiveRecord::Relation)["ActiveRecord::Relation"].each{|x| Mass.detach Mass[x]} GC.start 

And this would remove some of the ActiveRecord::Relation objects, but not all of them (not sure why, and those that are left have no Mass.references. Weird).

like image 762
AmitA Avatar asked Jun 22 '12 03:06

AmitA


1 Answers

I think I know what's going on. Ruby's GC wont free immutable objects (like symbols!). The keys returned by group_by are immutable strings, and so they wont be garbage collected.

UPDATE:

It seems like the problem is not with Rails itself. I tried using group_by alone, and sometimes the objects would not get garbage collected:

oscardelben~/% irb irb(main):001:0> class Foo irb(main):002:1> end => nil irb(main):003:0> {"1" => Foo.new, "2" => Foo.new} => {"1"=>#<Foo:0x007f9efd8072a0>, "2"=>#<Foo:0x007f9efd807250>} irb(main):004:0> ObjectSpace.each_object(Foo).count => 2 irb(main):005:0> GC.start => nil irb(main):006:0> ObjectSpace.each_object(Foo).count => 0 irb(main):007:0> {"1" => Foo.new, "2" => Foo.new}.group_by => #<Enumerator: {"1"=>#<Foo:0x007f9efb83d0c8>, "2"=>#<Foo:0x007f9efb83d078>}:group_by> irb(main):008:0> GC.start => nil irb(main):009:0> ObjectSpace.each_object(Foo).count => 2 # Not garbage collected irb(main):010:0> GC.start => nil irb(main):011:0> ObjectSpace.each_object(Foo).count => 0 # Garbage collected 

I've digged through the GC internals (which are surprisingly easy to understand), and this seems like a scope issue. Ruby walks through all the objects in the current scope and marks the ones which it thinks are still being used, after that it goes through all the objects in the heap and frees the ones which have not been marked.

In this case I think the hash is still being marked even though it's out of scope. There are many reasons why this may happening. I'll keep investigating.

UPDATE 2:

I've found what's keeping references of objects. To do that I've used the ruby mass gem. It turns out that Active Record relation keeps track of the objects returned.

User.limit(1).group_by(&:name) GC.start ObjectSpace.each_object(ActiveRecord::Base).each do |obj|   p Mass.references obj # {"ActiveRecord::Relation#70247565268860"=>["@records"]} end 

Unfortunately, calling reset on the relation didn't seem to help, but hopefully this is enough information for now.

like image 182
Oscar Del Ben Avatar answered Sep 22 '22 17:09

Oscar Del Ben