TL;DR: Is it OK to specify a schema in table_name_prefix
?
We have a large Rails application that is not quite a traditional multi-tenant app. We have a hundred clients, all supported by one app, and that number will never grow more than 1-2 per year. Currently, every client has their own Postgresql database.
We are addressing some infrastructure concerns of having so many distinct databases...most urgently, a high number of simultaneous database connections when processing many clients' data at the same time.
The app is not visible, even to clients, so a lot of traditional multi-tenant web site philosophies don't apply here neatly.
table_name_prefix
with the client's unique name.The tables vary extensively for each tenant. There is no hope or desire to normalize the clients together. Clients are not provisioned dynamically -- it is always a new code release with migrations.
We intend to move each of the client schemas into one database, so fewer distinct connection pools are required. The unique names we currently have at the database, schema, and table names mean there is no possibility of name collisions.
We've looked at the Apartment gem, and decided it is not a good fit for what we're doing.
We could add all hundred schemas to schema_search_path, so all clients could share the same connection pool and still find their schema. We believe this would reduce our db connection count one-hundred-fold. But we're a bit uneasy about that. I've found no discussions of how many are too many. Perhaps that would work, and perhaps there would not have a performance penalty finding tables.
We've found a very simple solution that seems promising, by adding the schema in the table_name_prefix. We're already setting this like:
def self.table_name_prefix
'client99_'
end
Through experimenting and looking within Rails 4 (our current version) and Rails 5 source code, this works to specify the schema ('tenant_99') as well as the traditional table prefix ('client99') :
def self.table_name_prefix
'tenant_99.client99_'
end
Before that change, queries looked like this:
SELECT COUNT(*) FROM 'client99_products'
After, they include the schema, as desired:
SELECT COUNT(*) FROM 'tenant_99.client99_products'
This seems to answer our needs, with no downsides. I've searched the Interwebs for people encouraging or discouraging this practice, and found no mention of it either way.
So through all this, here are the questions I haven't found definitive answers for:
schema_search_path
?table_name_prefix
okay? The reason is simple: SQL Server does cache the query plan for ad-hoc queries, but if the schema name isn't present in the query the cache can't be re-used for other users, only for the same user.
In PostgreSQL, by default, every database owns a default Schema named public. If you do not mention schema_name while creating or accessing the object then PostgreSQL will consider the schema_name as public.
Schemas can be used as prefixes within Drupal. That is, with a multisite setup, each site can reside in its own schema, and shared tables can reside in a "shared" schema (or even in the public schema).
The dbo schema is the default schema of every database. By default, users created with the CREATE USER Transact-SQL command have dbo as their default schema. The dbo schema is owned by the dbo user account. Users who are assigned the dbo as default schema don't inherit the permissions of the dbo user account.
To address your concerns in reverse order:
There are no problems with this just as long as the names are unique(internal and external).
schema_search_path
?The answer is maybe, any non-fully qualified request(asking for a table by name only) will have to search each of the schemas in the order listed in schema_search_path
If it is cached in memory there is little penalty; an on-disk search of all schemas will be slow(proportional to its location in the list.)
Be sure to list the most active schemas first.
A fully qualified request should take no longer than the separated database solution.
Assuming all of your calls are fully qualified, this technique should provide the full advantages of connection pooling, when possible.
Remember that connection pooling only minimizes the overhead of the setup and tear-down times of the connections, by taking advantage of "gaps" during communication.
For example:
You have four clients and three of them are making near constant requests, you will still have four connections to the server, even with pooling.
The advantage comes when you have four clients each utilizing a quarter of the resources, pooled over a single connection.
The underlying(excluding connection overhead) database utilization will remain the same, whether pooling with a single database or separate connections to separate databases.
The drawback/advantage to combining the databases into a single one is this: it is not possible to move individual databases to another server for load balancing purposes outside of PostgreSQLs methods for load balancing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With