I used to think schema were the "upper wrapper" object before the database itself. I mean DB.schema.<what_ever_object_name_under_schema>
.
Well, the catalog "wrapper" is now quite confusing. Why should we need a catalog? For what purpose, precisely should the catalog be used?
Catalog name is the database name. Schema name corresponds to the database owner. Catalog name is the database name. Schema name corresponds to the catalog owner, such as dbo.
A library catalog allows you to search for items owned by a specific library, including books, movies, journals, magazines, music scores, government documents, and more. Library databases allow you to efficiently search for published information such as magazine, journal, and newspaper articles.
A database catalog of a database instance consists of metadata in which definitions of database objects such as base tables, views (virtual tables), synonyms, value ranges, indexes, users, and user groups are stored.
Mar 2, 2017. The heart of any relational database management system is the system catalog that documents the database objects and system settings being used. The system catalog offers a wealth of information about your DBMS. You can think of it as the knowledge base of every piece of data known to the system.
Mike Sherrill 'Cat Recall' gave an excellent answer. I'll add simply one example: Postgres.
When you install Postgres on a machine, that installation is called a cluster. ‘Cluster’ here is not meant in the hardware sense of multiple computers working together. In Postgres, cluster refers to the fact that you can have multiple unrelated databases all up and running using the same Postgres server engine.
The word cluster is also defined by the SQL Standard in the same way as in Postgres. Closely following the SQL Standard is a primary goal of the Postgres project.
The SQL-92 specification says:
A cluster is an implementation-defined collection of catalogs.
and
Exactly one cluster is associated with an SQL-session
That's an obtuse way of saying a cluster is a database server (each catalog is a database).
So in both Postgres and the SQL Standard we have this containment hierarchy:
This diagram represents a single cluster. In the case of Postgres, you can have more than one cluster per host computer (or virtual OS). Multiple clusters is commonly done, for testing and deploying new versions of Postgres (ex: 9.0, 9.1, 9.2, 9.3, 9.4, 9.5).
If you did have multiple clusters, imagine the diagram above duplicated.
Different port numbers allow the multiple clusters to live side-by-side all up and running at the same time. Each cluster would be assigned its own port number. The usual 5432
is only the default, and can be set by you. Each cluster is listening on its own assigned port for incoming database connections.
For example, a company could have two different software development teams. One writes software to manage the warehouses while the other team builds software to manage sales and marketing. Each dev team has their own database, blissfully unaware of the other’s.
But the IT operations team took a decision to run both databases on a single computer box (Linux, Mac, whatever). So on that box they installed Postgres. So one database server (database cluster). In that cluster, they create two catalogs, a catalog for each dev team: one named 'warehouse' and one named 'sales'.
Each dev team uses many dozens of tables with different purposes and access roles. So each dev team organizes their tables into schemas. By coincidence, both dev teams do some tracking of accounting data, so each team happens to have a schema named 'accounting'. Using the same schema name is not a problem because the catalogs each have their own namespace so no collision.
Furthermore, each team eventually creates a table for accounting purposes named 'ledger'. Again, no naming collision.
You can think of this example as a hierarchy…
Postgres 9.2
cluster (installation) warehouse
catalog (database) inventory
schema accounting
schema ledger
tablesales
catalog (database) selling
schema accounting
schema (coincidental same name as above) ledger
table (coincidental same name as above)Postgres 9.3
cluster Each dev team's software makes a connection to the cluster. When doing so, they must specify which catalog (database) is theirs. Postgres requires that you connect to one catalog, but you are not limited to that catalog. That initial catalog is merely a default, used when your SQL statements omit the name of a catalog.
So if the dev team ever needs to access the other team's tables, they may do so if the database administrator has given them privileges to do so. Access is made with explicit naming in the pattern: catalog.schema.table. So if the 'warehouse' team needs to see the other team’s ('sales' team) ledger, they write SQL statements with sales.accounting.ledger
. To access their own ledger, they merely write accounting.ledger
. If they access both ledgers in the same piece of source code, they may choose to avoid confusion by including their own (optional) catalog name, warehouse.accounting.ledger
versus sales.accounting.ledger
.
By the way…
You may hear the word schema used in a more general sense, meaning the entire design of a particular database's table structure. By contrast, in the SQL Standard the word means specifically the particular layer in the Cluster > Catalog > Schema > Table
hierarchy.
Postgres uses both the word database as well as catalog in various places such as the CREATE DATABASE command.
Not all database system provides this full hierarchy of Cluster > Catalog > Schema > Table
. Some have only a single catalog (database). Some have no schema, just one set of tables. Postgres is an exceptionally powerful product.
From the relational point of view :
The catalog is the place where--among other things--all of the various schemas (external, conceptual, internal) and all of the corresponding mappings (external/conceptual, conceptual/internal) are kept.
In other words, the catalog contains detailed information (sometimes called descriptor information or metadata) regarding the various objects that are of interest to the system itself.
For example, the optimizer uses catalog information about indexes and other physical storage structures, as well as much other information, to help it decide how to implement user requests. Likewise, the security subsystem uses catalog information about users and security constraints to grant or deny such requests in the first place.
An Introduction to Database Systems, 7th ed., C.J. Date, p 69-70.
Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog contains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema.
Database Language SQL, (Proposed revised text of DIS 9075), p 45
A catalog is often synonymous with database. In most SQL dbms, if you query the information_schema views, you'll find that values in the "table_catalog" column map to the name of a database.
If you find your platform using catalog in a broader way than any of these three definitions, it might be referring to something broader than a database--a database cluster, a server, or a server cluster. But I kind of doubt that, since you'd have found that easily in your platform's documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With