Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it better to use multiple databases with one schema each, or one database with multiple schemas?

People also ask

Can a schema have multiple databases?

In the Oracle database system, the term database schema, which is also known as "SQL schema," has a different meaning. Here, a database can have multiple schemas (or “schemata,” if you're feeling fancy). Each one contains all the objects created by a specific database user.

Is it better to have multiple databases?

it is better to use different database for each client. Maintaining multiple databases is difficult. For example if we want to modify a table then we should do changes in all databases. Grouping of data into multiple databases each with a significantly fewer number of tables.

Why have multiple schemas in a database?

Using multiple private schemas is an effective way of separating database users from one another when sensitive information is involved. Typically a user is granted access to only one schema and its contents, thus providing database security at the schema level.

How many schemas can we have in a database?

So, the maximum numbers of schemas in a database is limited to the maximum number of integer data type (2^31-1 is the max of int).


A PostgreSQL "schema" is roughly the same as a MySQL "database". Having many databases on a PostgreSQL installation can get problematic; having many schemas will work with no trouble. So you definitely want to go with one database and multiple schemas within that database.


Definitely, I'll go for the one-db-many-schemas approach. This allows me to dump all the database, but restore just one very easily, in many ways:

  1. Dump the db (all the schema), load the dump in a new db, dump just the schema I need, and restore back in the main db.
  2. Dump the schema separately, one by one (but I think the machine will suffer more this way - and I'm expecting like 500 schemas!)

Otherwise, googling around I've seen that there is no auto-procedure to duplicate a schema (using one as a template), but many suggest this way:

  1. Create a template-schema
  2. When need to duplicate, rename it with new name
  3. Dump it
  4. Rename it back
  5. Restore the dump
  6. The magic is done.

I've written two rows in Python to do that; I hope they can help someone (in-2-seconds-written-code, don’t use it in production):

import os
import sys
import pg

# Take the new schema name from the second cmd arguments (the first is the filename)
newSchema = sys.argv[1]

# Temperary folder for the dumps
dumpFile = '/test/dumps/' + str(newSchema) + '.sql'

# Settings
db_name = 'db_name'
db_user = 'db_user'
db_pass = 'db_pass'
schema_as_template = 'schema_name'

# Connection
pgConnect = pg.connect(dbname= db_name, host='localhost', user= db_user, passwd= db_pass)

# Rename schema with the new name
pgConnect.query("ALTER SCHEMA " + schema_as_template + " RENAME TO " + str(newSchema))

# Dump it
command = 'export PGPASSWORD="' + db_pass + '" && pg_dump -U ' + db_user + ' -n ' + str(newSchema) + ' ' + db_name + ' > ' + dumpFile
os.system(command)

# Rename back with its default name
pgConnect.query("ALTER SCHEMA " + str(newSchema) + " RENAME TO " + schema_as_template)

# Restore the previous dump to create the new schema
restore = 'export PGPASSWORD="' + db_pass + '" && psql -U ' + db_user + ' -d ' + db_name + ' < ' + dumpFile
os.system(restore)

# Want to delete the dump file?
os.remove(dumpFile)

# Close connection
pgConnect.close()

I would say, go with multiple databases AND multiple schemas :)

Schemas in PostgreSQL are a lot like packages in Oracle, in case you are familiar with those. Databases are meant to differentiate between entire sets of data, while schemas are more like data entities.

For instance, you could have one database for an entire application with the schemas "UserManagement", "LongTermStorage" and so on. "UserManagement" would then contain the "User" table, as well as all stored procedures, triggers, sequences, etc. that are needed for the user management.

Databases are entire programs, schemas are components.


I would recommend against accepted answer - multiple databases instead of multiple schemas for this set of reasons:

  1. If you are running microservices, you want to enforce the inability to join between your "schemas", so the data is not entangled and developers won't end up joining other microservice's schema and wonder why when other team makes a change their stuff no longer works.
  2. You can later migrate to a separate database machine if your load requires with ease.
  3. If you need to have a high-availability and/or replication set up, it's better to have separate databases completely independent of each other. You cannot replicate one schema only compared to the whole database.

In a PostgreSQL context I recommend to use one db with multiple schemas, as you can (e.g.) UNION ALL across schemas, but not across databases. For that reason, a database is really completely insulated from another database while schemas are not insulated from other schemas within the same database.

If you -for some reason- have to consolidate data across schemas in the future, it will be easy to do this over multiple schemas. With multiple databases you would need multiple db-connections and collect and merge the data from each database "manually" by application logic.

The latter have advantages in some cases, but for the major part I think the one-database-multiple-schemas approach is more useful.


A number of schemas should be more lightweight than a number of databases, although I cannot find a reference which confirms this.

But if you really want to keep things very separate (instead of refactoring the web application so that a "customer" column is added to your tables), you may still want to use separate databases: I assert that you can more easily make restores of a particular customer's database this way -- without disturbing the other customers.