Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does repeated JSON parsing consume more and more memory?

It seems that parsing the same JSON file over and over again in Ruby uses increasingly larger amounts of memory. Consider the code and the output below:

  1. Why isn't the memory freed up after the first iteration?
  2. Why does a 116MB JSON file take up 1.5Gb of RAM after parsing? It's surprising considering the text file is converted into hashes. What am I missing here?

Code:

require 'json'

def memused
  `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1]/1024
end

text = IO.read('../data-grouped/2012-posts.json')
puts "before parsing: #{memused}MB"
iter = 1
while true
  items = JSON.parse(text)
  GC.start
  puts "#{iter}: #{memused}MB"
  iter += 1
end

Output:

before parsing: 116MB
1: 1840MB
2: 2995MB
3: 2341MB
4: 3017MB
5: 2539MB
6: 3019MB
like image 414
vrepsys Avatar asked Jun 17 '13 17:06

vrepsys


1 Answers

When Ruby parses a JSON file, it creates many intermediate objects to achieve the goal. These objects stays on memory until GC start working.

If the JSON file has a complicated structure, many arrays and inner objects, the number will grow fast too.

Did you try to call "GC.start" to suggest Ruby clean up unused memory? If the amount of memory decrease significantly, its suggest that is just intermediate objects used to parse the data, otherwise, your data structure is complex or there is something your data that the lib can't deallocate.

For large JSON processing I use yajl-ruby (https://github.com/brianmario/yajl-ruby). It is C implemented and has a low footprint.

like image 147
Thiago Lewin Avatar answered Nov 12 '22 09:11

Thiago Lewin