Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you determine what should be a primary key?

It's fairly general question, but I'd like to know what do you use in determination of primary key of the table. Examples supplied with you reasoning are highly desired.

I noticed that many programmers add ID column and use it as a primary key. I think, it is flawed from design point of view, as ID in that case HAS NOTHING TO DO WITH THE TABLE.

like image 995
vehomzzz Avatar asked Aug 13 '09 14:08

vehomzzz


5 Answers

The role of a primary key is to uniquely identify each row in your table. If no column or set of columns matches this requirement, an column containing a unique id is often added as a primary key.

I do not agree with your comment about programmers adding an id that has nothing to do with table data. When you need to link data across several tables, a concise id is easier to use than a compound key.

like image 56
christopheml Avatar answered Nov 20 '22 03:11

christopheml


My thought process in determining a primary key goes like this.

"One record in this table will represent ...?"

"For distinct value of Col X, Col Y, Col Z.. there should only be one row in the table", What are Cols X Y and Z ?"

The CAR_MODEL table.

Hmm this table will store information about different types of cars, should the MANUFACTURER_NAME be the key ? No, I can have many rows identifying different car models from the same manufacturer. Hmm should the MANUFACTURER_NAME and MODEL_NAME be the key ? No, i want to have different rows with the same MANUFACTURER_NAME and MODEL_NAME but different release years in the table at the same time. Ok what about "MANUFACTURER_NAME", "MODEL_NAME" and "RELEASE_YEAR".

Is it possible for me to have two rows with the same MANUFACTURER_NAME, MODEL_NAME and RELEASE_YEAR at the same time? Hmmm no. That wouldn't make sense, they would be the same Car Model, and I only want 1 record per car model. Great, that's the key.

One record in this table will represent a particular model from a particular year from a particular manufacturer. I decide this when i create the table, that's why i created the table, if you can't describe what's going in the table in terms that help identify the key you don't really understand why you are creating it.

Horrible Changes Over Time!!! (surrogate keys, Natural Key, Slowly changing dimensions)

Ah but the information I am storing about a particular Car Model (from a particular Manufacturer and Release Year) may change. Initially I was told that it had two doors, now I find it has four, I want to have this correct information in my table but not lose the old record as people have reported off it and I need to be able to reproduce their old results.

Ok, I will add a new column "MODEL_ID" and make it the primary key of the table, so I can store multiple records with the same model name, manufacturer name and release year. I will also add a valid_from and valid_to timestamp.

This can work well, and indeed with my changes the Primary Key of the table is now MODEL_ID, a surrogate key. But the Natural Key, the Business Key, the key 'at any point in time', is still Model_Name, Manufacturer_Name and Release_Year, and I can't loose sight of that.

Note on Surrogate Keys :

A surrogate key is unique for each row, by definition! A surrogate key makes it easier to manipulate data sometimes, especially data that changes over time. But a surrogate key doesn't in any way replace a Natural Primary Key, you still need to know what the 'grain' of the table is.

If we said that every person in Australia will be assigned a Stack_Overflow_User_id what would we do when Jeff and Joel started giving Stack_Overflow_User_Id's to dogs and cats and multiple IDs to the same people ??

We would say, "hey Jeff and Joel, only give out 1 ID per First_Name, Last_Name, Date_of_Birth and Place_of_Birth!". *

We need to know the natural key or we can give anything a surrogate key!

(* what about people where all these are the same ? don't we need a passport number or some sort of surrogate ? In practice a surrogate is nice and clean, but where did it originate ? originally it came from a natural key.)

like image 5
SamH Avatar answered Nov 20 '22 05:11

SamH


Hollywood Couples: Relationships are harder than you think ... one of my favorites.

like image 3
Sinan Ünür Avatar answered Nov 20 '22 03:11

Sinan Ünür


You pick anything that you know will be a unique value, preferably something numeric such as a customer ID or account number. Stay away from string based keys if at all possible. If nothing else, use a GUID value or an Auto-increment integer.

like image 3
BBlake Avatar answered Nov 20 '22 04:11

BBlake


A key should be a column where each entry is guaranteed to be unique. Examples might be things like a social insurance number or driver's license number. In theory you can tie multiple columns together into a compound key. So perhaps name and birth day might be unique together so they could be a key. However in practice nobody does that because crossing tables is a pain. The best solution is usually to add an autoincrementing value or GUID column.

like image 2
stimms Avatar answered Nov 20 '22 04:11

stimms