Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: Primary key column. Artificial "Id" column vs "Natural" columns [duplicate]

Possible Duplicate:
Relational database design question - Surrogate-key or Natural-key?

When I create relational table there is a temptation to choose primary key column the column which values are unique. But for optimization and uniformity purposes I create artifical Id column every time. If there is a column (or columns combination) that should be unique I create Unique Index for that instead of marking them as (composite) primary key column(s).

Is it really a good practice always to prefer artificial "Id" column + indexes instead of natural columns for a primary key?

like image 776
Andrew Florko Avatar asked Feb 10 '11 17:02

Andrew Florko


4 Answers

This is a bit of a religious debate. My personal preference is to have synthetic primary keys rather than natural primary keys but there are good arguments on both sides. Realistically, so long as you are consistent and reasonable, either approach can work well.

If you use natural keys, the two major downsides are the presence of composite keys and mutating primary key values. If you have composite primary keys, you'd obviously have to have multiple columns in each child table. That can get unwieldy from a data model perspective when there are many relationships among entities. But it can also cause grief for people developing queries-- it's awfully easy to create queries that use N-1 of N join conditions and get almost the right result. If you have natural keys, you'll also inevitably encounter a situation where the natural key value changes and you then have to ripple that change through many different entities-- that's vastly more complicated than changing a unique value in the table.

On the other hand, if you use synthetic keys, you're wasting space by adding additional columns, adding additional overhead to maintain an additional index, and you're increasing the risk that you'll get functionally duplicated results. It's awfully easy to either forget to create a unique constraint on the business key or to see that there is a non-unique index on the combination and just assume that it was a unique index. I actually just got bitten by this particular failing a couple days ago-- I had indexed the composite natural key (with a non-unique index) rather than creating a unique constraint. Dumb mistake but one that's relatively easy to make.

From a query writing and naming convention standpoint, I would also tend to prefer synthetic keys because it's nice to know when you're joining tables that the primary key of A is going to be A_ID and the primary key of B is going to be B_ID. That's far more self-documenting than trying to remember that the primary key of A is the combination of A_NAME and A_REVISION_NUMBER and that the primary key of B is B_CODE.

like image 87
Justin Cave Avatar answered Sep 22 '22 10:09

Justin Cave


There is little or no difference between a key enforced through a PRIMARY KEY constraint and a key enforced through a UNIQUE constraint. What's important is that you enforce ALL the keys necessary from a data integrity perspective. Usually that means at least one "natural" key (a key exposed to the users/consumers of the data and used to identify the facts about the universe of discourse) per table.

Optionally you might also want to create "technical" keys to support the application and database features rather than the end user (usually called surrogate keys). That should be very much a secondary consideration however. In the interests of simplicity (and very often performance as well) it usually makes sense only to create surrogate keys where you have identified a particular need for them and not before.

like image 35
nvogel Avatar answered Sep 22 '22 10:09

nvogel


It depends on your natural columns. If they are small and steadily increasing, then they are good candidates for the primary key.

  • Small - the smaller the key, the more values you can get into a single row, and the faster your index scans will be
  • Steadily increasing - produces fewer index reshuffles as the table grows, improving performance.
like image 34
Thomas Rushton Avatar answered Sep 21 '22 10:09

Thomas Rushton


My preference is to always use an artificial key.

First it is consistent. Anyone working on your application knows that there is a key and they can make assumptions on it. This makes it easier to understand and maintain.

I've also seen scenarios where the natural key (aka. a string from an HR system that identifies an employee) has to change during the life of the application. If you have an artificial key that links the natural id to your employee record then you only have to change that natural id in the one table. However, if that natural id is a primary key and you have it duplicated across a number of other tables as a foreign key, then you have a mess on your hands.

like image 33
Mayo Avatar answered Sep 19 '22 10:09

Mayo