Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composite Primary key vs additional "ID" column?

If we had a table like this:

Books (pretend "ISBN" doesn't exist)

  • Author
  • Title
  • Edition
  • Year of publication
  • Price

One could argue that {Author,Title,Edition} could be a candidate/primary key.

What determines whether the candidate/primary key should be {Author,Title,Edition} or whether an ID column should be used, with {Author,Title,Edition} a unique index/key constraint?

So is

  • Author (PK)
  • Title (PK)
  • Edition (PK)
  • Year of publication
  • Price

better, or:

  • ID (PK)
  • Author
  • Title
  • Edition
  • Year of publication
  • Price

where {Author,Title,Edition} is an additional unique index/constraint?

like image 645
user997112 Avatar asked Jan 29 '13 17:01

user997112


People also ask

Is it good to have composite primary key?

There is no conclusion that composite primary keys are bad. The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row. SQL (and the relational model) allows a composite primary key.

Why might we use a composite key instead of a normal single field primary key?

Composite primary key solved the problem of data uniqueness by creating a combined unique key. While, it can be also not convenient when you need to join different tables.

When would you use a composite primary key?

A Composite Primary Key is created by combining two or more columns in a table that can be used to uniquely identify each row in the table when the columns are combined, but it does not guarantee uniqueness when taken individually, or it can also be understood as a primary key created by combining two or more ...

What is the difference between primary key and identity column?

An identity is simply an auto-increasing column. A primary key is the unique column or columns that define the row. These two are often used together, but there's no requirement that this be so.


2 Answers

Say that {Author, Title, Edition} uniquely identifies a book, then the following holds:

  1. It is a superkey -- uniquely identifies a tuple (row).

  2. It is irreducible -- removing any of the columns does not make it a key any more.

  3. It is a candidate key -- an irreducible superkey is a candidate key.

Now let's consider the ID (integer)

I can reason that the Book table key will show up in few other tables as a foreign key and also in few indexes. So, it will take quite a bit of space -- say three columns x 40 characters (or whatever...) -- in each of these tables plus in matching indexes.

In order to make these "other" tables and indexes smaller, I can add a unique-integer-column to the Book table to be used as a key which will be referenced as a foreign key. Say something like:

alter table Book add BookID integer not null identity; 

With BookID being (must be) unique too, the Book table now has two candidate keys.

Now I can select the BookID as a primary key.

alter table Book add constraint pk_Book primary key (BookID); 

However, the {Author,Title,Edition} must stay a key (unique) in order to prevent something like this:

BookID  Author      Title           Edition -----------------------------------------------   1      C.J.Date  Database Design     1   2      C.J.Date  Database Design     1 

To sum it up, adding the BookID -- and choosing it as the primary -- did not stop {Author, Title, Edition} being a (candidate) key. It still must have its own unique constraint and usually the matching index.

Also note that from the design point, this decision was done on the "physical level". In general, on the logical level of design, this ID does not exists -- it got introduced during the consideration of column sizes and indexes. So, the physical schema was derived from the logical one. Depending on the DB size, RDBMS and hardware used, none of that size-reasoning may have measurable effect -- so using {Author, Title, Edition} as a PK may be perfectly good design -- until proven differently.

like image 125
Damir Sudarevic Avatar answered Oct 13 '22 22:10

Damir Sudarevic


In general, you don't want the primary key to change value. This is why blind or surrogate primary keys are used.

Let's assume you created your Book table with Author as part of the primary key.

Suppose you found out after about a year that you misspelled "Ray Bradbury". Or even worse, you misspelled "Rachael Bloom". Just imagine how many database rows you would have to modify to correct the misspelling. Just imagine how many index references have to be changed.

However, if you have an Author table with a surrogate key, you only have to correct one row. No indexes have to be changed.

Finally, database table names are usually singular (Book), rather than plural (Books).

like image 44
Gilbert Le Blanc Avatar answered Oct 14 '22 00:10

Gilbert Le Blanc