I'm using Twitter, Mongo, and Parallel in a loop to retrieve and store data.
Memory utilization hitting 1.5GB+
How is GC not cleaning this?
UPDATE: Here is the script in question.
allocated memory by location
-----------------------------------
973409328 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/timeout.rb:82
359655091 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/json-1.8.3/lib/json/common.rb:155
34706221 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/openssl/buffering.rb:182
31767589 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/net/http/response.rb:368
22055648 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/parallel-1.6.1/lib/parallel.rb:183
12129637 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/addressable-2.3.8/lib/addressable/uri.rb:525
11115133 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/net/protocol.rb:172
10609088 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/addressable-2.3.8/lib/addressable/idna/pure.rb:177
8333448 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/twitter-5.15.0/lib/twitter/base.rb:152
6041744 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/thread_safe-0.3.5/lib/thread_safe/non_concurrent_cache_backend.rb:8
4857232 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/addressable-2.3.8/lib/addressable/uri.rb:1477
4583920 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/monitor.rb:241
4524872 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/memoizable-0.4.2/lib/memoizable/method_builder.rb:117
4282752 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/twitter-5.15.0/lib/twitter/base.rb:151
4200641 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/mongo-2.1.1/lib/mongo/monitoring/command_log_subscriber.rb:104
3283047 /Users/jordan/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/net/http/response.rb:61
3150696 /Users/jordan/.rvm/gems/ruby-2.1.5/gems/mongo-2.1.1/lib/mongo/server/monitor.rb:125
allocated memory by gem
-----------------------------------
1084770550 ruby-2.1.5/lib
359655091 json-1.8.3
53016839 addressable-2.3.8
22069048 parallel-1.6.1
18422826 twitter-5.15.0
10829988 mongo-2.1.1
8908392 memoizable-0.4.2
6041744 thread_safe-0.3.5
4904294 faraday-0.9.2
3839455 other
3382080 naught-1.1.0
2429320 bson-3.2.6
1123917 rubygems
320962 rollbar-2.4.0
205097 activesupport-4.2.4
20005 multi_json-1.11.2
Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUE
s) in so-called heaps of size of approx 16KB. On a low level, RVALUE
is a c
-struct, containing a union of different standard ruby object representations.
So, heaps store RVALUE
objects, which size is not more than 40 bytes. For such objects as String
, Array
, Hash
etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.
This extra memory is flexible; is will be freed as soon as an object became GC’ed. But the heaps themselves are not released to OS anymore.
That said, once you are loading many short strings into ruby memory simultaneously, heaps amount is increasing and this memory is never returned back to ruby. This might sound weird, but try please not to store strings, shorter that 23 symbols. That insane, sorry for the proposal :)
That might help as well: http://www.sitepoint.com/ruby-uses-memory/
You're loading in a ton of data (depending on how many users you have) and then sleeping for 20 seconds in parallel. So basically if you have a hundred users, you're retrieving the twitter data for 100 users at once and then sleeping, doing that again and so on. This memory probably looks like it's being attributed to the timeout process because that's who's in charge of it during the 20 second sleep.
Try reducing the number of threads you're using from keys.length
to only a few (play with the number)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With