Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Switching between multiple databases in Rails without breaking transactions

I am setting up a Rails app with multiple databases. It uses ActiveRecord::Base.establish_connection db_config to switch between the databases (all of which are configured in database.yml).

establish_connection apparently breaks pending transactions on each call. One negative consequence is testing, where use_transactional_tests must be disabled (leading to undesirably slow tests).

So ... how can a Rails app maintain multiple transactions on different databases at the same time? (To clarify, I'm not looking for a fancy cross-database transaction. Just a way for the database client, ie the Rails app, to maintain multiple transactions simultaneously, one per database.)

The only solution I've seen is putting establish_connection directly in the class definition, but that assumes you have a database dedicated to specific classes. I am applying a user-based sharding strategy, where a single record type is distributed across multiple databases, so the database needs to be switched dynamically in code.

like image 638
mahemoff Avatar asked Apr 29 '17 04:04

mahemoff


People also ask

Can rails connect to multiple databases?

Rails now has support for multiple databases so you don't have to store your data all in one place. At this time the following features are supported: Multiple writer databases and a replica for each. Automatic connection switching for the model you're working with.

What is ActiveRecord in Ruby on Rails?

Active Record is the M in MVC - the model - which is the layer of the system responsible for representing business data and logic. Active Record facilitates the creation and use of business objects whose data requires persistent storage to a database.

How Rails transaction works?

Rails transactions are tied to one database connectionAnd as long as the transaction block is running this one database connection is open. So try to do as little as needed inside the transaction block, otherwise you will be blocking a database connection for more time than you should.

What is ActiveRecord base?

ActiveRecord::Base indicates that the ActiveRecord class or module has a static inner class called Base that you're extending.


1 Answers

This is a tricky problem, because of tight coupling inside ActiveRecord, but I've managed to create some proof of concept that works. Or at least it looks like it works.

Some background

ActiveRecord uses a ActiveRecord::ConnectionAdapters::ConnectionHandler class that is responsible for storing connection pools per model. By default there is only one connection pool for all models, because usual Rails app is connected to one database.

After executing establish_connection for different database in particular model, new connection pool is created for that model. And also for all models that may inherit from it.

Before executing any query, ActiveRecord first retrieves connection pool for relevant model and then retrieves the connection from the pool.

Note that above explanation may not be 100% accurate, but it should be close.

Solution

So the idea is to replace the default connection handler with custom one that will return connection pool based on provided shard description.

This can be implemented in many different ways. I did it by creating the proxy object that is passing shard names as disguised ActiveRecord classes. Connection handler is expecting to get AR model and looks at name property and also at superclass to walk the hierarchy chain of model. I've implemented DatabaseModel class that is basically shard name, but it is behaving like AR model.

Implementation

Here is example implementation. I've used sqlite database for simplicity, you can just run this file without any setup. You can also take a look at this gist

# Define some required dependencies
require "bundler/inline"
gemfile(false) do
  source "https://rubygems.org"
  gem "activerecord", "~> 4.2.8"
  gem "sqlite3"
end

require "active_record"

class User < ActiveRecord::Base
end

DatabaseModel = Struct.new(:name) do
  def superclass
    ActiveRecord::Base
  end
end

# Setup database connections and create databases if not present
connection_handler = ActiveRecord::ConnectionAdapters::ConnectionHandler.new
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new({
  "users_shard_1" => { adapter: "sqlite3", database: "users_shard_1.sqlite3" },
  "users_shard_2" => { adapter: "sqlite3", database: "users_shard_2.sqlite3" }
})

databases = %w{users_shard_1 users_shard_2}
databases.each do |database|
  filename = "#{database}.sqlite3"

  ActiveRecord::Base.establish_connection({
    adapter: "sqlite3",
    database: filename
  })

  spec = resolver.spec(database.to_sym)
  connection_handler.establish_connection(DatabaseModel.new(database), spec)

  next if File.exists?(filename)

  ActiveRecord::Schema.define(version: 1) do
    create_table :users do |t|
      t.string :name
      t.string :email
    end
  end
end

# Create custom connection handler
class ShardHandler
  def initialize(original_handler)
    @original_handler = original_handler
  end

  def use_database(name)
    @model= DatabaseModel.new(name)
  end

  def retrieve_connection_pool(klass)
    @original_handler.retrieve_connection_pool(@model)
  end

  def retrieve_connection(klass)
    pool = retrieve_connection_pool(klass)
    raise ConnectionNotEstablished, "No connection pool for #{klass}" unless pool
    conn = pool.connection
    raise ConnectionNotEstablished, "No connection for #{klass} in connection pool" unless conn
    puts "Using database \"#{conn.instance_variable_get("@config")[:database]}\" (##{conn.object_id})"
    conn
  end
end

User.connection_handler = ShardHandler.new(connection_handler)

User.connection_handler.use_database("users_shard_1")
User.create(name: "John Doe", email: "[email protected]")
puts User.count

User.connection_handler.use_database("users_shard_2")
User.create(name: "Jane Doe", email: "[email protected]")
puts User.count

User.connection_handler.use_database("users_shard_1")
puts User.count

I think this should give an idea how to implement production ready solution. I hope I didn't miss anything obvious here. I can suggest couple of different approaches:

  1. Subclass ActiveRecord::ConnectionAdapters::ConnectionHandler and overwrite those methods responsible for retrieving connection pools
  2. Create completely new class implementing the same api as ConnectionHandler
  3. I guess it is also possible to just overwrite retrieve_connection method. I don't remember where it is defined, but I think it is in ActiveRecord::Core.

I think approaches 1 and 2 are the way to go and should cover all cases when working with databases.

like image 128
Michał Młoźniak Avatar answered Sep 19 '22 20:09

Michał Młoźniak