Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it OK to specify a schema in `table_name_prefix`?

TL;DR: Is it OK to specify a schema in table_name_prefix?

We have a large Rails application that is not quite a traditional multi-tenant app. We have a hundred clients, all supported by one app, and that number will never grow more than 1-2 per year. Currently, every client has their own Postgresql database.

We are addressing some infrastructure concerns of having so many distinct databases...most urgently, a high number of simultaneous database connections when processing many clients' data at the same time.

The app is not visible, even to clients, so a lot of traditional multi-tenant web site philosophies don't apply here neatly.

  • Each tenant has a distinct Postgres database, managed in database.yml.
  • Each database has a schema, named for the tenant.
  • We have a model specific to each tenant with notably different code.
  • Each model uses establish_connection to select a different database and schema.
  • Each model uses a distinct table_name_prefix with the client's unique name.

The tables vary extensively for each tenant. There is no hope or desire to normalize the clients together. Clients are not provisioned dynamically -- it is always a new code release with migrations.

We intend to move each of the client schemas into one database, so fewer distinct connection pools are required. The unique names we currently have at the database, schema, and table names mean there is no possibility of name collisions.

We've looked at the Apartment gem, and decided it is not a good fit for what we're doing.

We could add all hundred schemas to schema_search_path, so all clients could share the same connection pool and still find their schema. We believe this would reduce our db connection count one-hundred-fold. But we're a bit uneasy about that. I've found no discussions of how many are too many. Perhaps that would work, and perhaps there would not have a performance penalty finding tables.

We've found a very simple solution that seems promising, by adding the schema in the table_name_prefix. We're already setting this like:

def self.table_name_prefix
  'client99_'
end

Through experimenting and looking within Rails 4 (our current version) and Rails 5 source code, this works to specify the schema ('tenant_99') as well as the traditional table prefix ('client99') :

def self.table_name_prefix
  'tenant_99.client99_'
end

Before that change, queries looked like this:

SELECT COUNT(*) FROM 'client99_products'

After, they include the schema, as desired:

SELECT COUNT(*) FROM 'tenant_99.client99_products'

This seems to answer our needs, with no downsides. I've searched the Interwebs for people encouraging or discouraging this practice, and found no mention of it either way.

So through all this, here are the questions I haven't found definitive answers for:

  • Is there a concern of having too many schemas listed in schema_search_path?
  • Is putting a schema name in table_name_prefix okay?
like image 795
David Hempy Avatar asked May 14 '19 21:05

David Hempy


People also ask

Why you should always use schema name in queries?

The reason is simple: SQL Server does cache the query plan for ad-hoc queries, but if the schema name isn't present in the query the cache can't be re-used for other users, only for the same user.

What is Postgres default schema?

In PostgreSQL, by default, every database owns a default Schema named public. If you do not mention schema_name while creating or accessing the object then PostgreSQL will consider the schema_name as public.

What is schema prefix?

Schemas can be used as prefixes within Drupal. That is, with a multisite setup, each site can reside in its own schema, and shared tables can reside in a "shared" schema (or even in the public schema).

What is default schema DBO?

The dbo schema is the default schema of every database. By default, users created with the CREATE USER Transact-SQL command have dbo as their default schema. The dbo schema is owned by the dbo user account. Users who are assigned the dbo as default schema don't inherit the permissions of the dbo user account.


1 Answers

To address your concerns in reverse order:

  • Is putting a schema name in table_name_prefix okay?

There are no problems with this just as long as the names are unique(internal and external).

  • Is there a concern of having too many schemas listed in schema_search_path?

The answer is maybe, any non-fully qualified request(asking for a table by name only) will have to search each of the schemas in the order listed in schema_search_path If it is cached in memory there is little penalty; an on-disk search of all schemas will be slow(proportional to its location in the list.) Be sure to list the most active schemas first.

A fully qualified request should take no longer than the separated database solution.

Assuming all of your calls are fully qualified, this technique should provide the full advantages of connection pooling, when possible.

Remember that connection pooling only minimizes the overhead of the setup and tear-down times of the connections, by taking advantage of "gaps" during communication.

For example:

  • You have four clients and three of them are making near constant requests, you will still have four connections to the server, even with pooling.

  • The advantage comes when you have four clients each utilizing a quarter of the resources, pooled over a single connection.

The underlying(excluding connection overhead) database utilization will remain the same, whether pooling with a single database or separate connections to separate databases.

The drawback/advantage to combining the databases into a single one is this: it is not possible to move individual databases to another server for load balancing purposes outside of PostgreSQLs methods for load balancing.

like image 155
Strom Avatar answered Sep 29 '22 07:09

Strom