The team I'm working with decided to create a table with a varchar primary key. This table is referenced by another table on this primary key. I've the habit to create an integer primary key, following what I learnt at university. I've read that there is a performance boost using integer primary key. The matter is that I don't know any other reason for creating an integer primary key. Do you have any tips?

VARCHAR vs. INT doesn't tell much. What matter is the access pattern. On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide. But what really matters is the choice of clustered key. Often confused with the primary key, the two represent different notions and are not required to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int? The choice of the clustered key is just about the most important decision in table design, and a mechanical application of an <code>INT identity(1,1)</code> on it may be just the biggest mistake one can make. Here is where the question of access patterns comes in: <ul> <li>what are the most frequent interrogations on the table? <ul> <li>what columns are projected?</li> <li>what predicates are applied?</li> <li>what ranges are searched?</li> <li>what joins are performed?</li> <li>what aggregations occur?</li> </ul> </li> <li>how is the data inserted into the table?</li> <li>how is the data updated in the table?</li> <li>how is old data purged from the table, if ever?</li> <li>how many non-clustered indexes exist? <ul> <li>how often are columns included in the NC indexes (key or leaf) are updated?</li> </ul> </li> </ul> Overall, there are many access patterns that can be ruined by using an INT IDENTITY clustered key. So before jumping to apply a cookie cutter solution, perhaps a little bit of analysis is required... Some more general guidelines: <ul> <li>Clustered Index Design Guidelines</li> <li>Index Design Basics</li> <li>Unique Index Design Guidelines</li> </ul> You see there are no Primary Key design guidelines, because the Primary key is not an issue of storage design but an issue of modeling and is entirely domain driven.

The primary key is supposed to represent the identity for the row and should not change over time. I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example: <ul> <li>The data was incorrectly entered and needs to be fixed.</li> <li>The user changes their name or email address.</li> <li>The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.</li> <li>Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.</li> </ul> By using a surrogate key you avoid problems caused by having to change primary keys.

SQL primary key: integer vs varchar

2 Answers

VARCHAR vs. INT doesn't tell much. What matter is the access pattern.

On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide.

But what really matters is the choice of clustered key. Often confused with the primary key, the two represent different notions and are not required to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int? The choice of the clustered key is just about the most important decision in table design, and a mechanical application of an INT identity(1,1) on it may be just the biggest mistake one can make. Here is where the question of access patterns comes in:

what are the most frequent interrogations on the table?
- what columns are projected?
- what predicates are applied?
- what ranges are searched?
- what joins are performed?
- what aggregations occur?
how is the data inserted into the table?
how is the data updated in the table?
how is old data purged from the table, if ever?
how many non-clustered indexes exist?
- how often are columns included in the NC indexes (key or leaf) are updated?

Overall, there are many access patterns that can be ruined by using an INT IDENTITY clustered key. So before jumping to apply a cookie cutter solution, perhaps a little bit of analysis is required...

Some more general guidelines:

Clustered Index Design Guidelines
Index Design Basics
Unique Index Design Guidelines

You see there are no Primary Key design guidelines, because the Primary key is not an issue of storage design but an issue of modeling and is entirely domain driven.

answered Sep 18 '22 08:09

Remus Rusanu

The primary key is supposed to represent the identity for the row and should not change over time.

I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:

The data was incorrectly entered and needs to be fixed.
The user changes their name or email address.
The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.
Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.

By using a surrogate key you avoid problems caused by having to change primary keys.

answered Sep 18 '22 08:09

Mark Byers

Related questions
                            
                                Select columnValue if the column exists otherwise null
                            
                                How to delete rows in tables that contain foreign keys to other tables
                            
                                Grant privileges on future tables in PostgreSQL?
                            
                                MySQL Trigger - Storing a SELECT in a variable
                            
                                What is the best free SQL GUI for Linux for various DBMS systems [closed]
                            
                                cx_Oracle: How do I iterate over a result set?
                            
                                What’s the best way to capitalise the first letter of each word in a string in SQL Server
                            
                                How to replace (null) values with 0 output in PIVOT
                            
                                SQL Update to the SUM of its joined values
                            
                                Is there any function in oracle similar to group_concat in mysql? [duplicate]
                            
                                How to drop column if it exists in PostgreSQL 9+?
                            
                                SQL Server - How to lock a table until a stored procedure finishes
                            
                                Oracle: '= ANY()' vs. 'IN ()'
                            
                                How to choose returned column name in a SELECT FOR XML query?
                            
                                adding a column description
                            
                                How to do pagination in SQL Server 2008
                            
                                How to restore PostgreSQL dump file into Postgres databases?
                            
                                How to write function for optional parameters in postgresql?
                            
                                Does Sql JOIN order affect performance?
                            
                                Order by COUNT per value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL primary key: integer vs varchar

Tags:

performance

sql

indexing

frabiacca

People also ask

2 Answers

Remus Rusanu

Mark Byers

Recent Activity

Donate For Us