I have a fairly simple question about natural/surrogate key usage in a well-defined context which manifests itself often, and that i'm going to illustrate.
Let's assume you are designing the DB schema for a product using SQL Server 2005 as DBMS. For the sake of simplicity let's say there are only two entities involved, which have been mapped to 2 tables, Master and Slave.
Assume that:
The question is: how would you design keys/constraints/references for those tables? Would you rather (argumenting your choice):
As for me I'd go with option 2), mainly because of assumption 3) and performance-wise, but I'd like to hear someone else's opinion (since there is quite an open debate on the topic).
Natural key: an attribute that can uniquely identify a row, and exists in the real world. Surrogate key: an attribute that can uniquely identify a row, and does not exist in the real world. Composite key: more than one attribute that when combined can uniquely identify a row.
A surrogate key is a system generated (could be GUID, sequence, unique identifier, etc.) value with no business meaning that is used to uniquely identify a record in a table. The key itself could be made up of one or multiple columns (i.e. Composite Key).
Surrogate key and primary key are two types of keys. The main difference between surrogate key and primary key is that surrogate key is a type of primary key that helps to identify each record uniquely, while the primary key is a set of minimal columns that helps to identify each record uniquely.
I'd go for option 2. Keep it simple.
It ticks the boxes (narrow, numeric, unchanging, strictly monotonically increasing) for a useful clustered index (which is the default of PKs in SQL Server).
You need to force the uniqueness on A,B,C,D
, though, to preserve data integrity, as noted.
There is nothing conceptually wrong with option 1, but as soon as you require more indexes on "master" then the wide clustered key becomes a liability. Or more work to determine which index is best as clustered.
Edit:
In case of any confusion
the choice of which index is clustered is separate to the choice of key
Your assumption (3) tends to suggest option (2) because it is inconvenient and potentially time consuming to deal with cascading updates of the primary key of Master when B changes.
Of course it depends on how often this will occur: if it is something that you expect to happen "all the time" then it suggests (A,B,C,D) is a poor choice of primary key; on the other hand, if it will only rarely happen, then (A,B,C,D) may be a good choice of primary key, and having those columns in Slave may have some advantages (no need to join to Master all the time to find out those column values).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With