Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best practice for primary keys in tables? [closed]

People also ask

What is the best data type for primary key columns and why?

A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column.


I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.


Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:

  • US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
  • Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
  • Employee Salary History: (employee_id, start_date). I would not create an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")

Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:

state_id    state_code   state_name
137         TX           Texas
...         ...          ...
249         TX           Texas

I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.

My primary keys are all handled by the database program in the background and the user is never aware of them.


Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).

Let's say we have these tables and columns:

Company:
  CompanyId   (primary key)

CostCenter:
  CompanyId   (primary key, foreign key to Company)
  CostCentre  (primary key)

CostElement
  CompanyId   (primary key, foreign key to Company)
  CostElement (primary key)

Invoice:
  InvoiceId    (primary key)
  CompanyId    (primary key, in foreign key to CostCentre, in foreign key to CostElement)
  CostCentre   (in foreign key to CostCentre)
  CostElement  (in foreign key to CostElement)

In case that last bit doesn't make sense, Invoice.CompanyId is part of two foreign keys, one to the CostCentre table and one to the CostElement table. The primary key is (InvoiceId, CompanyId).

In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be.

The fewer chances to screw up, the better.


There´s no problem in making your primary key from various fields, that's a Natural Key.

You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key.

That´s an old discussion. I prefer surrogate keys in most situations.

But there´s no excuse for the lack of a key.

RE: EDIT

Yeah, there´s a lot of controversy about that :D

I don´t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber - or something like that - instead of idPerson.

Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example).

As you get used to surrogates, it seems more clean, and manageable.

But in the end, you´ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don´t.


Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields.

Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it.

Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data.


Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.

Just to quote a few points:

The developer must apply a few rules when choosing a primary key for each table:

  • The primary key must uniquely identify each record.
  • A record’s primary-key value can’t be null.
  • The primary key-value must exist when the record is created.
  • The primary key must remain stable—you can’t change the primary-key field(s).
  • The primary key must be compact and contain the fewest possible attributes.
  • The primary-key value can’t be changed.

Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)