Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple schemas versus enormous tables [closed]

Consider a mobile device manager system that contains information for every user such as a table that stores the apps that he has installed on the phone, auditing details, notification information etc. Is it wise to create a seperate schema for each user with the corresponding tables? The number of tables is large for a single user amounting to about 30 tables each. Would it be better to have a seperate schema where all this information is placed into these tables (in turn creating enormous tables?) or have a schema for each user?

Thanks in advance

like image 891
michelle Avatar asked Dec 01 '11 11:12

michelle


People also ask

Can a table be in multiple schemas?

IOW, each table just belongs to only one schema.

What is difference between schema and table?

What is the difference between Schema and Table? A database schema describes the structure and organization of data in a database system, while a table is a data set in which the data is organized in to a set of vertical columns and horizontal rows.

What is multiple schema?

Using Multiple Private Schemas. Using multiple private schemas is an effective way of separating database users from one another when sensitive information is involved. Typically a user is granted access to only one schema and its contents, thus providing database security at the schema level.

What is multi schema approach in cloud computing?

It is possible to run Process Services in “multi-schema multi-tenancy” mode (MS-MT). This is a multi-tenant setup where every tenant has its own database schema. This means that the data of one tenant is completely separated from the data of other tenants.


1 Answers

I want to see which method is more efficient in terms of querying in the database.

In a multi-tenant database, querying is only part of the problem. Other parts of the problem are cost, data isolation and protection, maintenance, and disaster recovery. These are significant; you can't consider only query efficiency in a multi-tenant database.

Multi-tenant solutions range from one database per tenant (shared nothing) to one row per tenant (shared everything).

"Shared nothing", "separate database", or one database per tenant

  • Most expensive per client. (Large numbers of clients imply large numbers of servers.)
  • Highest degree of data isolation.
  • Disaster recovery for a single tenant is simple and straightforward.
  • Maintenance is theoretically harder, because changes need to be carried out in every database. But your dbms might easily support running stored procedures in each database. (SQL Server has an undocumented system stored procedure, sp_msforeachdb, for example. You can probably write your own.)
  • "Shared nothing" is the most easily customizable, too, but that also raises more maintenance issues. For example, each tenant might have a different pattern of usage, which suggests each tenant might need some indexes that other tenants wouldn't. That's trivial to do with "shared nothing"; impossible with "shared everything" (below).
  • Lowest number of rows per table. Querying speed is near optimal.

"Shared everything", or "shared schema", or "one database per planet"

  • Least expensive per tenant.
  • Lowest degree of data isolation. Every table has a column that identifies which tenant a row belongs to. Since tenant rows are mixed in every table, it's relatively simple to accidentally expose other tenant's data.
  • Disaster recovery for a single tenant is relatively complicated; you have to restore individual rows in many tables. On the other hand, a single-tenant disaster is relatively unusual. Most disasters will probably affect all tenants.
  • Structural maintenance is simpler, given that all tenants share the tables. It increases the communication load, though, because you have to communicate and coordinate each change with every tenant. It's not easily customizable.
  • Highest number of rows per table. Quick querying is harder, but it depends on how many tenants and how many rows. You could easily tip over into VLDB territory.

Between "shared nothing" and "shared everything" is "shared schema".

"Shared schema"

  • Tenants share a database, but each tenant has it's own named schema. Cost falls between "shared nothing" and "shared everything"; big systems typically need fewer servers than "shared nothing", more servers than "shared everything".
  • Much better isolation than "shared everything". Not quite as much isolation as "shared nothing". (You can GRANT and REVOKE permissions on schemas.)
  • Disaster recovery for a single tenant require restoring one of many schemas. This is either relatively easy or fairly hard, depending on your dbms.
  • Maintenance is easier than "shared nothing"; not as easy as "shared everything". It's relatively simple to write a stored procedure that will execute in each schema in a database. It's easier to share common tables among tenants than with "shared nothing".
  • Usually more active tenants per server than "shared nothing", which means they share (degrade) more resources. But not as bad as "shared everything".

Microsoft has a good article on multi-tenant architecture with more details. (The link is to just one page of a multi-page document. Microsoft has since removed that page; the link is now to a copy in archive.org.)

like image 82
Mike Sherrill 'Cat Recall' Avatar answered Oct 17 '22 13:10

Mike Sherrill 'Cat Recall'