Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby not releasing memory

I have Ruby code that more or less look like this

offset = 0
index = 1

User.establish_connection(..) # db1
class Member < ActiveRecord::Base
  self.table_name = 'users'
end 

Member.establish_connection(..) #db2

while true
  users = User.limit(10000).offset(offset).as_json ## for a Database 1
  offset = limit * index
  index += 1
  users.each do |u|
    member =  Member.find_by(name: u[:name])
    if member.nil?
      Member.create(u)
    elsif member.updated_at < u[:updated_at]   
      member.update_attributes(u)   
    end
  end 
  break if break_condition
end

What I'm seeing is that the RSS memory(htop) keep growing and at one point it reaches 10GB. I'm not sure why is this happening but memory is never seem to be released by Ruby back to the OS.

I'm aware there is a long list of questions that are inline with this. I have even tried changing by code to look like this (last 3 line specifically).i.e Running GC.start manually result still the same.

while true

....
...
...
users = nil
GC.start
break if break_condition
end

Tested this on Ruby version 2.2.2 and 2.3.0

EDIT: Other detail

1) OS.

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.04
DISTRIB_CODENAME=vivid
DISTRIB_DESCRIPTION="Ubuntu 15.04"

2) ruby installed and complied via rvm.

3) ActiveRecord version 4.2.6

like image 326
Viren Avatar asked Oct 31 '22 05:10

Viren


1 Answers

I can't tell you the source of the memory leak, but I do spy some low-hanging fruit.

But first, two things:

  1. Are you sure that ActiveRecord is the right way to copy data from one database to another? I'm very confident that it's not. Every major database product has robust export and import capabilities, and the performance you'll see there will be many, many times better than doing it in Ruby, and you can always invoke those tools from within your app. Think hard about that before you continue down this path.

  2. Where does the number 10,000 come from? Your code suggests that you know it's not a good idea to fetch all of the records at once, but 10,000 is still a lot of records. You may see some gains by simply trying different numbers: 100 or 1,000, say.

That said, let's dig into what this line is doing:

users = User.limit(10000).offset(offset).as_json

The first part, User.limit(10000).offset(offset) creates an ActiveRecord::Relation object representing your query. When you call as_json on it, the query is executed, which instantiates 10,000 User model objects and puts them in an array, and then a Hash is constructed from each of those User objects' attributes. (Take a look at the source for ActiveRecord::Relation#as_json here.)

In other words, you're instantiating 10,000 User objects only to throw them away after you've got their attributes.

So, a quick win is to skip that part entirely. Just select the raw data:

user_keys = User.attribute_names

until break_condition
  # ...
  users_values = User.limit(10000).offset(offset).pluck(user_keys)

  users_values.each do |vals|
    user_attrs = user_keys.zip(vals).to_h
    member = Member.find_by(name: user_attrs["name"])
    member.update_attributes(user_attrs)  
  end
end

ActiveRecord::Calculations#pluck returns an array of arrays with the values from each record. Inside the user_values.each loop we turn that values array into a Hash. No need to instantiate any User objects.

Now let's take a look at this:

member = Member.find_by(name: user_attrs["name"])
member.update_attributes(user_attrs)

This selects a record from the database, instantiates a Member object, and then updates the record in the database—10,000 times in every iteration of the while loop. This is the correct approach if you need validations to run when that record is updated. If you don't need validations to run, though, you can save time and memory by, again, not instantiating any objects:

Member.where(name: user_attrs["name"]).update_all(user_attrs)

The difference is that ActiveRecord::Relation#update_all doesn't select the record from the database or instantiate a Member object, it just updates it. You said in your comment above that you have a unique constraint on the name column, so we know that this will update only a single record.

Having made those changes, you must still contend with the fact that you have to do 10,000 UPDATE queries in each iteration of the while loop. Again, consider using your databases' built-in export and import functionality instead of trying to make Rails do this.

like image 95
Jordan Running Avatar answered Nov 15 '22 06:11

Jordan Running