I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.
This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.
So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.
All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).
Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.
The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?
So my question:
These data migrations have about the following format:
class MigrateSomeThing < ActiveRecord::Migration[5.2]
def up
Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter
create_thing(old_thing, Postgres::ModelName.new)
end
raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
end
def down
Postgres::ModelName.delete_all
end
def create_thing(old_thing, new_thing)
attrs = old_thing.attributes
# ... maybe alter the attributes slightly to fit Postgres depending on the thing.
new_thing.attributes = attrs
new_thing.save!
end
end
I suggest narrowing down the memory consumption to the reading or the writing side (or, put differently, Mongoid vs AR) by performing all of the reads but none of the model creation/writes and seeing if memory usage is still growing.
Mongoid performs finds in batches by default unlike AR where this has to be requested through find_in_batches
.
Since ActiveRecord migrations are wrapped in transactions by default, and AR performs attribute value tracking to restore model instances' attributes to their previous values if transaction commit fails, it is likely that all of the AR models being created are remaining in memory and cannot be garbage collected until the migration finishes. Possible solutions to this are:
Disable implicit transaction for the migration in question (https://apidock.com/rails/ActiveRecord/Migration):
disable_ddl_transaction!
Create data via direct inserts, bypassing model instantiation entirely (this will also speed up the process). The most basic way is via SQL (Rails ActiveRecord: Getting the id of a raw insert), there are also libraries for this (Bulk Insert records into Active Record table).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With