I have Ruby code that more or less look like this
offset = 0
index = 1
User.establish_connection(..) # db1
class Member < ActiveRecord::Base
self.table_name = 'users'
end
Member.establish_connection(..) #db2
while true
users = User.limit(10000).offset(offset).as_json ## for a Database 1
offset = limit * index
index += 1
users.each do |u|
member = Member.find_by(name: u[:name])
if member.nil?
Member.create(u)
elsif member.updated_at < u[:updated_at]
member.update_attributes(u)
end
end
break if break_condition
end
What I'm seeing is that the RSS memory(htop) keep growing and at one point it reaches 10GB. I'm not sure why is this happening but memory is never seem to be released by Ruby back to the OS.
I'm aware there is a long list of questions that are inline with this. I have even tried changing by code to look like this (last 3 line specifically).i.e Running GC.start
manually result still the same.
while true
....
...
...
users = nil
GC.start
break if break_condition
end
Tested this on Ruby version 2.2.2
and 2.3.0
EDIT: Other detail
1) OS.
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=15.04
DISTRIB_CODENAME=vivid
DISTRIB_DESCRIPTION="Ubuntu 15.04"
2) ruby installed and complied via rvm.
3) ActiveRecord version 4.2.6
I can't tell you the source of the memory leak, but I do spy some low-hanging fruit.
But first, two things:
Are you sure that ActiveRecord is the right way to copy data from one database to another? I'm very confident that it's not. Every major database product has robust export and import capabilities, and the performance you'll see there will be many, many times better than doing it in Ruby, and you can always invoke those tools from within your app. Think hard about that before you continue down this path.
Where does the number 10,000 come from? Your code suggests that you know it's not a good idea to fetch all of the records at once, but 10,000 is still a lot of records. You may see some gains by simply trying different numbers: 100 or 1,000, say.
That said, let's dig into what this line is doing:
users = User.limit(10000).offset(offset).as_json
The first part, User.limit(10000).offset(offset)
creates an ActiveRecord::Relation object representing your query. When you call as_json
on it, the query is executed, which instantiates 10,000 User model objects and puts them in an array, and then a Hash is constructed from each of those User objects' attributes. (Take a look at the source for ActiveRecord::Relation#as_json
here.)
In other words, you're instantiating 10,000 User objects only to throw them away after you've got their attributes.
So, a quick win is to skip that part entirely. Just select the raw data:
user_keys = User.attribute_names
until break_condition
# ...
users_values = User.limit(10000).offset(offset).pluck(user_keys)
users_values.each do |vals|
user_attrs = user_keys.zip(vals).to_h
member = Member.find_by(name: user_attrs["name"])
member.update_attributes(user_attrs)
end
end
ActiveRecord::Calculations#pluck
returns an array of arrays with the values from each record. Inside the user_values.each
loop we turn that values array into a Hash. No need to instantiate any User objects.
Now let's take a look at this:
member = Member.find_by(name: user_attrs["name"])
member.update_attributes(user_attrs)
This selects a record from the database, instantiates a Member object, and then updates the record in the database—10,000 times in every iteration of the while
loop. This is the correct approach if you need validations to run when that record is updated. If you don't need validations to run, though, you can save time and memory by, again, not instantiating any objects:
Member.where(name: user_attrs["name"]).update_all(user_attrs)
The difference is that ActiveRecord::Relation#update_all
doesn't select the record from the database or instantiate a Member object, it just updates it. You said in your comment above that you have a unique constraint on the name
column, so we know that this will update only a single record.
Having made those changes, you must still contend with the fact that you have to do 10,000 UPDATE queries in each iteration of the while
loop. Again, consider using your databases' built-in export and import functionality instead of trying to make Rails do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With