Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging sidekiq worker memory leak with jemalloc

So, I've got a memory leak in my Sidekiq worker. I have a worker server with one queue for this worker task only, that gets to about 10G RSS in a week.

I tried to reproduce it locally with only 1 worker thread and voila - I get from 200M to 1G in one night, processing 1 task/min. Naturally, I want to know what's leaking, so I'm also logging RSS, heap_live_slots and heap_free_slots. When I plot results I can see steady RSS growth while live and free slots fluctuate randomly but in well defined and constant boundaries, while their sum remains constant.

At this point I come to the conclusion that the leak likely occurs not in Ruby code, but rather in some native extension. So I reinstall ruby with Jemalloc support via RVM: rvm reinstall 2.4.2 --with-jemalloc

Then I set up MALLOC_CONF:

export MALLOC_CONF='prof_leak:true,lg_prof_sample:0,prof_final:true,stats_print:true'

And fire up Sidekiq. Freshly started Sidekiq with 1 worker thread is worth about 200M RSS, but when I press Ctrl+C and look at stats output of jemalloc I see something completely different:

Arenas: 32
Quantum size: 16
Page size: 4096
Maximum thread-cached size class: 32768
Allocated: 34056, active: 61440, metadata: 2949272, resident: 2981888, mapped: 6352896, retained: 2035712

What? 6M mapped? This cannot be true. So I start irb and do the following:

2.4.2 :001 > arr = []
 => [] 
2.4.2 :002 > loop do
2.4.2 :003 >   arr << 'a'*10000000
2.4.2 :004?>   sleep 1
2.4.2 :005?> end

After waiting until irb process climbs to about 1G RSS I stop the process... and see exactly same numbers. Maybe visualizing the call graph will help me understand what's going on?

jeprof --show_bytes --pdf `which ruby` jeprof.10536.0.f.heap > ruby.pdf

Using local file /home/mhi/.rvm/rubies/ruby-2.4.2/bin/ruby.
Using local file jeprof.10536.0.f.heap.
No nodes to print

So something is clearly wrong, and that's what I need help figuring out.

Here's full output from jemalloc stat: https://pastebin.com/RiMLtqA6

UPD.

So I've updated all the native-extension related gems, here's output of bundle exec ruby -e 'puts Gem.loaded_specs.values.select{ |i| !i.extensions.empty? }.map{ |i| "#{i.name} #{i.version}" }':

io-console 0.4.6
nokogiri 1.8.1
bcrypt 3.1.11
debug_inspector 0.0.3
binding_of_caller 0.7.2
json 2.1.0
capybara-webkit 1.14.0
damerau-levenshtein 1.3.0
unf_ext 0.0.7.4
eventmachine 1.2.5
ffi 1.9.18
kgio 2.11.0
msgpack 1.1.0
mysql2 0.4.9
rainbow 2.2.2
raindrops 0.18.0
rbtrace 0.4.8
stackprof 0.2.10
therubyracer 0.12.3
unicode 0.4.4.4
unicorn 5.3.0

Same result: RSS, Memory slots

like image 621
Roman Avatar asked Nov 10 '17 10:11

Roman


1 Answers

Ruby 2.4.2 has a known issue with jemalloc.

the issue was closed about a month ago, but I don't know if the package you're using is patched... Actually, I don't think the patch was released yet. It's likely all the jemalloc stat data is irrelevant.

Also, this seems like an X-Y question (you're asking about jemalloc, while you probably want a solution to the memory "leak").

Before assuming a memory leak in native code (though a distinct possibility), I would consider how the task's scope might affect the garbage collector and try to minimize both the scope and any long-life variables.

For example, if your task is a Proc, it might be bound to the global scope, which means that some variables might live forever...

Try enclosing the task in a function and make sure that none of the variables are referenced once the task is done.

like image 179
Myst Avatar answered Oct 21 '22 16:10

Myst