The team I'm working with decided to create a table with a varchar primary key. This table is referenced by another table on this primary key.
I've the habit to create an integer primary key, following what I learnt at university. I've read that there is a performance boost using integer primary key.
The matter is that I don't know any other reason for creating an integer primary key. Do you have any tips?
No, the primary key does not have to be an integer; it's just very common that it is. As an example, we have User ID's here that can have leading zeroes and so must be stored in a varchar field. That field is used as a primary key in our Employee table.
VARCHAR column as Primary Key is not a good choice as normally we create Cluster Index on same column. Cluster Index on VARCHAR columns is a bad choice because of expected high fragmentation rate.
One of a set, e.g. 0-9: If these are the PKs of a lookup table (as they should be), use an int. If their meaning is outside of the DB consider using a VARCHAR (and smile, if the hardware vendor upgrades from strictly-numerical keypads to ones also using # and * )
Data Type. Integer (number) data types are the best choice for primary key, followed by fixed-length character data types. SQL Server processes number data type values faster than character data type values because it converts characters to ASCII equivalent values before processing, which is an extra step.
VARCHAR vs. INT doesn't tell much. What matter is the access pattern.
On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide.
But what really matters is the choice of clustered key. Often confused with the primary key, the two represent different notions and are not required to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int? The choice of the clustered key is just about the most important decision in table design, and a mechanical application of an INT identity(1,1)
on it may be just the biggest mistake one can make. Here is where the question of access patterns comes in:
Overall, there are many access patterns that can be ruined by using an INT IDENTITY clustered key. So before jumping to apply a cookie cutter solution, perhaps a little bit of analysis is required...
Some more general guidelines:
You see there are no Primary Key design guidelines, because the Primary key is not an issue of storage design but an issue of modeling and is entirely domain driven.
The primary key is supposed to represent the identity for the row and should not change over time.
I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:
By using a surrogate key you avoid problems caused by having to change primary keys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With