Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the cause of a memory leak in Ruby

I've discovered a memory leak in my Rails code - that is to say, I've found what code leaks but not why it leaks. I've reduced it down to a test case that doesn't require Rails:

require 'csspool' require 'ruby-mass'  def report     puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_s + 'KB'     Mass.print end  report  # note I do not store the return value here CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))  ObjectSpace.garbage_collect sleep 1  report 

ruby-mass supposedly lets me see all the objects in memory. CSSPool is a CSS parser based on racc. /home/jason/big.css is a 1.5MB CSS file.

This outputs:

Memory 9264KB  ==================================================  Objects within [] namespace ==================================================   String: 7261   RubyVM::InstructionSequence: 1151   Array: 562   Class: 313   Regexp: 181   Proc: 111   Encoding: 99   Gem::StubSpecification: 66   Gem::StubSpecification::StubLine: 60   Gem::Version: 60   Module: 31   Hash: 29   Gem::Requirement: 25   RubyVM::Env: 11   Gem::Specification: 8   Float: 7   Gem::Dependency: 7   Range: 4   Bignum: 3   IO: 3   Mutex: 3   Time: 3   Object: 2   ARGF.class: 1   Binding: 1   Complex: 1   Data: 1   Gem::PathSupport: 1   IOError: 1   MatchData: 1   Monitor: 1   NoMemoryError: 1   Process::Status: 1   Random: 1   RubyVM: 1   SystemStackError: 1   Thread: 1   ThreadGroup: 1   fatal: 1 ==================================================  Memory 258860KB  ==================================================  Objects within [] namespace ==================================================   String: 7456   RubyVM::InstructionSequence: 1151   Array: 564   Class: 313   Regexp: 181   Proc: 113   Encoding: 99   Gem::StubSpecification: 66   Gem::StubSpecification::StubLine: 60   Gem::Version: 60   Module: 31   Hash: 30   Gem::Requirement: 25   RubyVM::Env: 13   Gem::Specification: 8   Float: 7   Gem::Dependency: 7   Range: 4   Bignum: 3   IO: 3   Mutex: 3   Time: 3   Object: 2   ARGF.class: 1   Binding: 1   Complex: 1   Data: 1   Gem::PathSupport: 1   IOError: 1   MatchData: 1   Monitor: 1   NoMemoryError: 1   Process::Status: 1   Random: 1   RubyVM: 1   SystemStackError: 1   Thread: 1   ThreadGroup: 1   fatal: 1 ================================================== 

You can see the memory going way up. Some of the counters go up, but no objects specific to CSSPool are present. I used ruby-mass's "index" method to inspect the objects that have references like so:

Mass.index.each do |k,v|     v.each do |id|         refs = Mass.references(Mass[id])         puts refs if !refs.empty?     end end 

But again, this doesn't give me anything related to CSSPool, just gem info and such.

I've also tried outputting "GC.stat"...

puts GC.stat CSSPool::CSS::Document.parse(File.new('/home/jason/big.css')) ObjectSpace.garbage_collect sleep 1 puts GC.stat 

Result:

{:count=>4, :heap_used=>126, :heap_length=>138, :heap_increment=>12, :heap_live_num=>50924, :heap_free_num=>24595, :heap_final_num=>0, :total_allocated_object=>86030, :total_freed_object=>35106} {:count=>16, :heap_used=>6039, :heap_length=>12933, :heap_increment=>3841, :heap_live_num=>13369, :heap_free_num=>2443302, :heap_final_num=>0, :total_allocated_object=>3771675, :total_freed_object=>3758306} 

As I understand it, if an object is not referenced and garbage collection happens, then that object should be cleared from memory. But that doesn't seem to be what's happening here.

I've also read about C-level memory leaks, and since CSSPool uses Racc which uses C code, I think this is a possibility. I've run my code through Valgrind:

valgrind --partial-loads-ok=yes --undef-value-errors=no --leak-check=full --fullpath-after= ruby leak.rb 2> valgrind.txt 

Results are here. I'm not sure if this confirms a C-level leak, as I've also read that Ruby does things with memory that Valgrind doesn't understand.

Versions used:

  • Ruby 2.0.0-p247 (this is what my Rails app runs)
  • Ruby 1.9.3-p392-ref (for testing with ruby-mass)
  • ruby-mass 0.1.3
  • CSSPool 4.0.0 from here
  • CentOS 6.4 and Ubuntu 13.10
like image 673
Jason Barnabe Avatar asked Dec 04 '13 20:12

Jason Barnabe


People also ask

What causes memory leaks in Ruby?

Slow Release. Another important cause of memory bloat in Ruby is a slow release of freed memory back to the system. In this situation, memory is freed much more slowly than the rate at which new memory blocks are allocated to objects.

How do you detect a memory leak?

The best approach to checking for the existence of a memory leak in your application is by looking at your RAM usage and investigating the total amount of memory been used versus the total amount available. Evidently, it is advisable to obtain snapshots of your memory's heap dump while in a production environment.

What is the main cause of memory leaks?

A memory leak starts when a program requests a chunk of memory from the operating system for itself and its data. As a program operates, it sometimes needs more memory and makes an additional request.


1 Answers

It looks like you are entering The Lost World here. I don’t think the problem is with c-bindings in racc either.

Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUEs) in so-called heaps of size of approx 16KB. On a low level, RVALUE is a c-struct, containing a union of different standard ruby object representations.

So, heaps store RVALUE objects, which size is not more than 40 bytes. For such objects as String, Array, Hash etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.

This extra memory is flexible; is will be freed as soon as an object became GC’ed. That’s why your testcase with big_string shows the memory up-down behaviour:

def report   puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`           .strip.split.map(&:to_i)[1].to_s + 'KB' end report big_var = " " * 10000000 report big_var = nil  report ObjectSpace.garbage_collect sleep 1 report # ⇒ Memory 11788KB # ⇒ Memory 65188KB # ⇒ Memory 65188KB # ⇒ Memory 11788KB 

But the heaps (see GC[:heap_length]) themselves are not released back to OS, once acquired. Look, I’ll make a humdrum change to your testcase:

- big_var = " " * 10000000 + big_var = 1_000_000.times.map(&:to_s) 

And, voilá:

# ⇒ Memory 11788KB # ⇒ Memory 65188KB # ⇒ Memory 65188KB # ⇒ Memory 57448KB 

The memory is not released back to OS anymore, because each element of the array I introduced suits the RVALUE size and is stored in the ruby heap.

If you’ll examine the output of GC.stat after the GC was run, you’ll find that GC[:heap_used] value is decreased as expected. Ruby now has a lot of empty heaps, ready.

The summing up: I don’t think, the c code leaks. I think the problem is within base64 representation of huge image in your css. I have no clue, what’s happening inside parser, but it looks like the huge string forces the ruby heap count to increase.

Hope it helps.

like image 90
Aleksei Matiushkin Avatar answered Sep 18 '22 13:09

Aleksei Matiushkin