It seems that parsing the same JSON file over and over again in Ruby uses increasingly larger amounts of memory. Consider the code and the output below:
Code:
require 'json'
def memused
`ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1]/1024
end
text = IO.read('../data-grouped/2012-posts.json')
puts "before parsing: #{memused}MB"
iter = 1
while true
items = JSON.parse(text)
GC.start
puts "#{iter}: #{memused}MB"
iter += 1
end
Output:
before parsing: 116MB
1: 1840MB
2: 2995MB
3: 2341MB
4: 3017MB
5: 2539MB
6: 3019MB
When Ruby parses a JSON file, it creates many intermediate objects to achieve the goal. These objects stays on memory until GC start working.
If the JSON file has a complicated structure, many arrays and inner objects, the number will grow fast too.
Did you try to call "GC.start" to suggest Ruby clean up unused memory? If the amount of memory decrease significantly, its suggest that is just intermediate objects used to parse the data, otherwise, your data structure is complex or there is something your data that the lib can't deallocate.
For large JSON processing I use yajl-ruby (https://github.com/brianmario/yajl-ruby). It is C implemented and has a low footprint.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With