Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain constant memory usage when migrating a Rails application from mongoid (MongoDB) to ActiveRecord (Postgres)?

I have recently started consulting and helping with the development of a Rails application that was using MongoDB (with Mongoid as its DB client) to store all its model instances.

This was fine while the application was in an early startup stage, but as the application got more and more clients and while starting to need more and more complicated queries to show proper statistics and other information in the interface, we decided that the only viable solution forward was to normalize the data, and move to a structured database instead.

So, we are now in the process of migrating both the tables and the data from MongoDB (with Mongoid as object-mapper) to Postgres (with ActiveRecord as object-mapper). Because we have to make sure that there is no improper non-normalized data in the Mongo database, we have to run these data migrations inside Rails-land, to make sure that validations, callbacks and sanity checks are being run.

All went 'fine' on development, but now we are running the migration on a staging server, with the real production database. It turns out that for some migrations, the memory usage of the server increases linearly with the number of model instances, causing the migration to be killed once we've filled 16 GB of RAM (and another 16GB of swap...).

Since we migrate the model instances one by one, we hope to be able to find a way to make sure that the memory usage can remain (near) constant.

The things that currently come to mind that might cause this are (a) ActiveRecord or Mongoid keeping references to object instances we have already imported, and (b) the migration is run in a single DB transaction, so Postgres is taking more and more memory until it is completed maybe?

So my question:

  • What is the probable cause of this linear memory usage?
  • How can we reduce it?
  • Are there ways to make Mongoid and/or ActiveRecord relinquish old references?
  • Should we attempt to call the Ruby GC manually?
  • Are there ways to split a data migration into multiple DB transactions, and would that help?

These data migrations have about the following format:

class MigrateSomeThing < ActiveRecord::Migration[5.2]
  def up
    Mongodb::ModelName.all.each do |old_thing| # Mongoid's #.all.each works with batches, see https://stackoverflow.com/questions/7041224/finding-mongodb-records-in-batches-using-mongoid-ruby-adapter 
      create_thing(old_thing, Postgres::ModelName.new)
    end
    raise "Not all rows could be imported" if MongoDB::ModelName.count != Postgres::ModelName.count
  end

  def down
    Postgres::ModelName.delete_all
  end

  def create_thing(old_thing, new_thing)
    attrs = old_thing.attributes
    # ... maybe alter the attributes slightly to fit Postgres depending on the thing.
    new_thing.attributes = attrs
    new_thing.save!
  end

end
like image 298
Qqwy Avatar asked May 30 '19 11:05

Qqwy


1 Answers

I suggest narrowing down the memory consumption to the reading or the writing side (or, put differently, Mongoid vs AR) by performing all of the reads but none of the model creation/writes and seeing if memory usage is still growing.

Mongoid performs finds in batches by default unlike AR where this has to be requested through find_in_batches.

Since ActiveRecord migrations are wrapped in transactions by default, and AR performs attribute value tracking to restore model instances' attributes to their previous values if transaction commit fails, it is likely that all of the AR models being created are remaining in memory and cannot be garbage collected until the migration finishes. Possible solutions to this are:

  1. Disable implicit transaction for the migration in question (https://apidock.com/rails/ActiveRecord/Migration):

    disable_ddl_transaction!

  2. Create data via direct inserts, bypassing model instantiation entirely (this will also speed up the process). The most basic way is via SQL (Rails ActiveRecord: Getting the id of a raw insert), there are also libraries for this (Bulk Insert records into Active Record table).

like image 95
D. SM Avatar answered Nov 15 '22 03:11

D. SM