When I am creating a new database table, what factors should I take into account for selecting the primary key's data type?

Sorry to do that, but I found that the answers I gave to related questions (you can check this and this) could apply to this one. I reshaped them a little bit... You will find many posts dealing with this issue, and each choice you'll make has its pros and cons. Arguments for these usually refer to relational database theory and database performance. On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Natural keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons: field too short, rules change, etc. To this point, you've guessed here that I am basically a member of the uniqueIdentifier/surrogate primary key team, and even if I appreciate and understand arguments such as the ones presented here, I am still looking for the case where "natural" key is better than surrogate ... In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and productivity: each time I create a table, shall I lose time <ol> <li>identifying its primary key and its physical characteristics (type, size)</li> <li>remembering these characteristics each time I want to refer to it in my code?</li> <li>explaining my PK choice to other developers in the team?</li> </ol> My answer is no to all of these questions: <ol> <li>I have no time to lose trying to identify "the best Natural Primary Key" when the surrogate option gives me a bullet-proof solution.</li> <li>I do not want to remember that the Primary Key of my Table_whatever is a 10 characters long string when I write the code.</li> <li>I don't want to lose my time negotiating the Natural Key length: "well if You need 10 why don't you take 12 to be on the safe side?". This "on the safe side" argument really annoys me: If you want to stay on the safe side, it means that you are really not far from the unsafe side! Choose surrogate: it's bullet-proof!</li> </ol> So I've been working for the last five years with a very basic rule: each table (let's call it 'myTable') has its first field called <code>'id_MyTable'</code> which is of uniqueIdentifier type. Even if this table supports a "many-to-many" relation, where a field combination offers a very acceptable Primary Key, I prefer to create this <code>'id_myManyToManyTable'</code> field being a uniqueIdentifier, just to stick to the rule, and because, finally, it does not hurt. The major advantage is that you don't have to care anymore about the use of Primary Key and/or Foreign Key within your code. Once you have the table name, you know the PK name and type. Once you know which links are implemented in your data model, you'll know the name of available foreign keys in the table. And if you still want to have your "Natural Key" somewhere in your table, I advise you to build it following a standard model such as <pre class="prettyprint"><code>Tbl_whatever id_whatever, unique identifier, primary key code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed ..... </code></pre> Where id_ is the prefix for primary key, and code_ is used for "natural" indexed field. Some would argue that the code_ field should be set as unique. This is true, and it can be easily managed either through DDL or external code. Note that many "natural" keys are calculated (invoice numbers), so they are already generated through code I am not sure that my rule is the best one. But it is a very efficient one! If everyone was applying it, we would for example avoid time lost answering to this kind of question!

What should I consider when selecting a data type for my primary key?

2 Answers

Sorry to do that, but I found that the answers I gave to related questions (you can check this and this) could apply to this one. I reshaped them a little bit...

You will find many posts dealing with this issue, and each choice you'll make has its pros and cons. Arguments for these usually refer to relational database theory and database performance.

On this subject, my point is very simple: surrogate primary keys ALWAYS work, while Natural keys MIGHT NOT ALWAYS work one of these days, and this for multiple reasons: field too short, rules change, etc.

To this point, you've guessed here that I am basically a member of the uniqueIdentifier/surrogate primary key team, and even if I appreciate and understand arguments such as the ones presented here, I am still looking for the case where "natural" key is better than surrogate ...

In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and productivity:

each time I create a table, shall I lose time

identifying its primary key and its physical characteristics (type, size)
remembering these characteristics each time I want to refer to it in my code?
explaining my PK choice to other developers in the team?

My answer is no to all of these questions:

I have no time to lose trying to identify "the best Natural Primary Key" when the surrogate option gives me a bullet-proof solution.
I do not want to remember that the Primary Key of my Table_whatever is a 10 characters long string when I write the code.
I don't want to lose my time negotiating the Natural Key length: "well if You need 10 why don't you take 12 to be on the safe side?". This "on the safe side" argument really annoys me: If you want to stay on the safe side, it means that you are really not far from the unsafe side! Choose surrogate: it's bullet-proof!

So I've been working for the last five years with a very basic rule: each table (let's call it 'myTable') has its first field called 'id_MyTable' which is of uniqueIdentifier type. Even if this table supports a "many-to-many" relation, where a field combination offers a very acceptable Primary Key, I prefer to create this 'id_myManyToManyTable' field being a uniqueIdentifier, just to stick to the rule, and because, finally, it does not hurt.

The major advantage is that you don't have to care anymore about the use of Primary Key and/or Foreign Key within your code. Once you have the table name, you know the PK name and type. Once you know which links are implemented in your data model, you'll know the name of available foreign keys in the table.

And if you still want to have your "Natural Key" somewhere in your table, I advise you to build it following a standard model such as

Tbl_whatever

   id_whatever, unique identifier, primary key
   code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed
   .....

Where id_ is the prefix for primary key, and code_ is used for "natural" indexed field. Some would argue that the code_ field should be set as unique. This is true, and it can be easily managed either through DDL or external code. Note that many "natural" keys are calculated (invoice numbers), so they are already generated through code

I am not sure that my rule is the best one. But it is a very efficient one! If everyone was applying it, we would for example avoid time lost answering to this kind of question!

108

answered Sep 23 '22 12:09

Philippe Grondier

If using a numeric key, make sure the datatype is giong to be large enough to hold the number of rows you might expect the table to grow to.

If using a guid, does the extra space needed to store the guid need to be considered? Will coding against guid PKs be a pain for developers or users of the application.

If using composite keys, are you sure that the combined columns will always be unique?

answered Sep 20 '22 12:09

Ely

Related questions
                            
                                Do MySQL indexes still work after inserting new rows?
                            
                                SQL Multiple select count statements in one query
                            
                                Custom post type yearly/ monthly archive
                            
                                How would I use excel to generate a large update sql statement?
                            
                                DB::update laravel 5 raw query
                            
                                Cursor while loop returning every value but the last
                            
                                SQL Server DELETE is slower with indexes
                            
                                Array of arrays in PostgreSQL
                            
                                Dividing 2 numbers in Sql Server
                            
                                Convert a 12 hour format to 24 hour format in sql server
                            
                                How to get number of days in a month in SQL Server
                            
                                JOIN and LEFT JOIN equivalent in LINQ
                            
                                Selecting records in SQL based on another table's contents
                            
                                SQLServer - Select bool if column begins with a string
                            
                                Splitting Date into 2 Columns (Date + Time) in SQL
                            
                                Get row count of all tables in database: SQL Server [duplicate]
                            
                                SQL Server - select substring of all characters following last hyphen
                            
                                Why is LAST_VALUE() not working in SQL Server?
                            
                                In SQL, what is the letter after a table name in a select statement?
                            
                                Selecting all items in one table and join with another table, allowing nulls

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What should I consider when selecting a data type for my primary key?

Tags:

sql

database-design

Aaron Fischer

People also ask

2 Answers

Philippe Grondier

Ely

Recent Activity

Donate For Us