Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby big array and memory

I created a big array a, whose memory grew to ~500 MB:

a = []

t = Thread.new do 
  loop do
    sleep 1
    print "#{a.size} "
  end
end

5_000_000.times do
  a << [rand(36**10).to_s(36)]
end

puts "\n size is #{a.size}"
a = []

t.join

After that, I "cleared" a, but the allocated memory didn't change until I killed the process. Is there something special I need to do to remove all these data which were assigned to a from the memory?

like image 493
evfwcqcg Avatar asked Aug 11 '12 07:08

evfwcqcg


People also ask

Does Ruby use a lot of memory?

Ruby apps can use a lot of memory. But why? According to Heroku and Nate Berkopec, a large part of excessive memory usage is caused by memory fragmentation, and by memory overallocation.

Are arrays dynamic in Ruby?

Unlike other programming languages like Java, Ruby only has dynamic arrays but no static arrays.

How do you reduce memory bloat in Ruby?

Trimming to Fix Ruby Memory Bloat You need to override the garbage collection process and release memory more often to fix slow memory release. There is an API that can do this called malloc_trim. All you need to do is modify Ruby to call this function during the garbage collection process.

How does Ruby manage memory?

For Dynamic Memory allocation, the Ruby program uses Heap memory and the basic unit of the heap is a slot. Here, each slot occupies a value which is known as RVALUE. This RVALUE comprises 40 bytes and a container for objects of all types (Array, String, Class).


1 Answers

If I use the Ruby Garbage Collection Profiler on a lightly modified version of your code:

GC::Profiler.enable
GC::Profiler.clear

a = []
5_000_000.times do
  a << [rand(36**10).to_s(36)]
end

puts "\n size is #{a.size}"
a = []

GC::Profiler.report

I get the following output (on Ruby 1.9.3)(some columns and rows removed):

GC 60 invokes.
Index    Invoke Time(sec)       Use Size(byte)     Total Size(byte)     ...
    1               0.109               131136               409200     ...
    2               0.125               192528               409200     ...
  ...
   58              33.484            199150344            260938656     ...
   59              36.000            211394640            260955024     ...

The profile starts with 131 136 bytes used, and ends with 211 394 640 bytes used, without decreasing in size anywhere in the run, we can assume that no garbage collection has taken place.

If I then add a line of code which adds a single element to the array a, placed after a has grown to 5 million elements, and then has an empty array assigned to it:

GC::Profiler.enable
GC::Profiler.clear

a = []
5_000_000.times do
  a << [rand(36**10).to_s(36)]
end

puts "\n size is #{a.size}"
a = []

# the only change is to add one element to the (now) empty array a
a << [rand(36**10).to_s(36)]

GC::Profiler.report

This changes the profiler output to (some columns and rows removed):

GC 62 invokes.
Index    Invoke Time(sec)       Use Size(byte)     Total Size(byte)     ...
    1               0.156               131376               409200     ...
    2               0.172               192792               409200     ...
  ...
   59              35.375            211187736            260955024     ...
   60              36.625            211395000            469679760     ...
   61              41.891              2280168            307832976     ...

This profiler run now starts with 131 376 bytes used, which is similar to the previous run, grows, but ends with 2 280 168 bytes used, significantly lower than the previous profile run that ended with 211 394 640 bytes used, we can assume that garbage collection took place this during this run, probably triggered by our new line of code that adds an element to a.

The short answer is no, you don't need to do anything special to remove the data that was assigned to a, but hopefully this gives you the tools to prove it.

like image 174
tsundoku Avatar answered Nov 23 '22 03:11

tsundoku