I would like to see an example of: <ul> <li>When this is appropriate</li> <li>When this is not appropriate</li> </ul> Is there a time when the choice of database would make a difference to the above examples?

This really seems to be a question about surrogate keys, which are always either an auto-incrementing number or GUID and hence a single column, vs. natural keys, which often require multiple pieces of information in order to be truly unique. If you are able to have a natural key that is only one column, then the point is obviously moot anyway. Some people will insist on only using one or the other. Spend sufficient time working with production databases and you'll learn that there is no context-independent best practice. Some of these answers use SQL Server terminology but the concepts are generally applicable to all DBMS products: <hr> <h3>Reasons to use single-column surrogate keys:</h3> <ul> <li>Clustered indexes. A clustered index always performs best when the database can merely append to it - otherwise, the DB has to do page splits. Note that this only applies if the key is sequential, i.e. either an auto-increment sequence or a sequential GUID. Arbitrary GUIDs will probably be much worse for performance.</li> <li>Relationships. If your key is 3, 4, 5 columns long, including character types and other non-compact data, you end up wasting enormous amounts of space and subsequently reduce performance if you have to create foreign key relationships to this key in 20 other tables.</li> <li>Uniqueness. Sometimes you don't have a true natural key. Maybe your table is some sort of log, and it's possible for you to get two of the same event at the same time. Or maybe your real key is something like a materialized path that can only be determined after the row is already inserted. Either way, you always want your clustered index and/or primary key to be unique, so if you have no other truly unique information, you have no choice but to employ a surrogate key.</li> <li>Compatibility. Most people will never have to deal with this, but if the natural key contains something like a <code>hierarchyid</code>, it's possible that some systems can't even read it. In this case, again you must create a simple auto-generated surrogate key for use by these applications. Even if you don't have any "weird" data in the natural key, some DB libraries have a lot of trouble dealing with multi-column primary keys, although this problem is quickly going away.</li> </ul> <h3>Reasons to use multi-column natural keys</h3> <ul> <li>Storage. Many people who work with databases never work with large enough ones to have to care about this factor. But when a table has billions or trillions of rows, you are going to want to keep the absolute minimum amount of data in this table that you possibly can.</li> <li>Replication. Yes, you can use a GUID, or a sequential GUID. But GUIDs have their own trade-offs, and if you can't or don't want to use a GUID for some reason, a multi-column natural key is a much better choice for replication scenarios because it is intrinsically globally unique - that is, you don't need a special algorithm to make it unique, it's unique by definition. This makes it very easy to reason about distributed architectures.</li> <li>Insert/Update Performance. Surrogate keys aren't free. If you have a set of columns that are unique and frequently queried on, and you therefore need to create a covering index on these columns; the index ends up being almost as large as the table, which wastes space and requires that a second index be updated every time you make any modifications. If it is ever possible for you to have only one index (the clustered index) on a table, you should do it!</li> </ul> <hr> That's what comes to mind right off the bat. I'll update if I suddenly remember anything else.

I think it's almost always better (from an application developer standpoint, at least) to make the primary key an auto-generated key, and create a UNIQUE constraint and an index on the multiple columns. <ul> <li>With a single auto-generated primary key, you'll be able to easily add references to this table from other tables.</li> <li>Auto-generated primary keys work more simply with ORM libraries.</li> <li>Also, if your uniqueness constraints change in the future, you don't have to change the existing primary keys.</li> </ul> I've run into several headache-inducing situations because a DBA thought that a multiple-column primary key would always be sufficient, and future requirements changes proved this incorrect.

What are the pros and cons of using multi column primary keys?

2 Answers

This really seems to be a question about surrogate keys, which are always either an auto-incrementing number or GUID and hence a single column, vs. natural keys, which often require multiple pieces of information in order to be truly unique. If you are able to have a natural key that is only one column, then the point is obviously moot anyway.

Some people will insist on only using one or the other. Spend sufficient time working with production databases and you'll learn that there is no context-independent best practice.

Some of these answers use SQL Server terminology but the concepts are generally applicable to all DBMS products:

Reasons to use single-column surrogate keys:

Clustered indexes. A clustered index always performs best when the database can merely append to it - otherwise, the DB has to do page splits. Note that this only applies if the key is sequential, i.e. either an auto-increment sequence or a sequential GUID. Arbitrary GUIDs will probably be much worse for performance.
Relationships. If your key is 3, 4, 5 columns long, including character types and other non-compact data, you end up wasting enormous amounts of space and subsequently reduce performance if you have to create foreign key relationships to this key in 20 other tables.
Uniqueness. Sometimes you don't have a true natural key. Maybe your table is some sort of log, and it's possible for you to get two of the same event at the same time. Or maybe your real key is something like a materialized path that can only be determined after the row is already inserted. Either way, you always want your clustered index and/or primary key to be unique, so if you have no other truly unique information, you have no choice but to employ a surrogate key.
Compatibility. Most people will never have to deal with this, but if the natural key contains something like a hierarchyid, it's possible that some systems can't even read it. In this case, again you must create a simple auto-generated surrogate key for use by these applications. Even if you don't have any "weird" data in the natural key, some DB libraries have a lot of trouble dealing with multi-column primary keys, although this problem is quickly going away.

Reasons to use multi-column natural keys

Storage. Many people who work with databases never work with large enough ones to have to care about this factor. But when a table has billions or trillions of rows, you are going to want to keep the absolute minimum amount of data in this table that you possibly can.
Replication. Yes, you can use a GUID, or a sequential GUID. But GUIDs have their own trade-offs, and if you can't or don't want to use a GUID for some reason, a multi-column natural key is a much better choice for replication scenarios because it is intrinsically globally unique - that is, you don't need a special algorithm to make it unique, it's unique by definition. This makes it very easy to reason about distributed architectures.
Insert/Update Performance. Surrogate keys aren't free. If you have a set of columns that are unique and frequently queried on, and you therefore need to create a covering index on these columns; the index ends up being almost as large as the table, which wastes space and requires that a second index be updated every time you make any modifications. If it is ever possible for you to have only one index (the clustered index) on a table, you should do it!

That's what comes to mind right off the bat. I'll update if I suddenly remember anything else.

137

answered Oct 13 '22 18:10

Aaronaught

I think it's almost always better (from an application developer standpoint, at least) to make the primary key an auto-generated key, and create a UNIQUE constraint and an index on the multiple columns.

With a single auto-generated primary key, you'll be able to easily add references to this table from other tables.
Auto-generated primary keys work more simply with ORM libraries.
Also, if your uniqueness constraints change in the future, you don't have to change the existing primary keys.

I've run into several headache-inducing situations because a DBA thought that a multiple-column primary key would always be sufficient, and future requirements changes proved this incorrect.

answered Oct 13 '22 16:10

Kaleb Brasee

Related questions
                            
                                How do I combine the results of two queries with ordering?
                            
                                How to change the port when calling sqlcmd
                            
                                Iterate through rows in SQL Server 2008
                            
                                How to enable MultipleActiveResultSets
                            
                                IntelliJ IDEA - Syntax Highlighting of SQL Inside Java Code
                            
                                How to count setof / number of keys of JSON in postgresql?
                            
                                Identifying source table from UNION query
                            
                                While Loop to Iterate through Databases
                            
                                mysql fix Using where;
                            
                                restore sql server from .bak file Exclusive access could not be obtained
                            
                                How to implement SQL joins without using JOIN?
                            
                                IsNull function in DB2 SQL?
                            
                                SQL Server: any equivalent of strpos()?
                            
                                T-SQL conditional UPDATE (v2)
                            
                                "Save changes is not permitted" when changing an existing column to be nullable
                            
                                PostgreSQL - how should I use first_value()?
                            
                                SQL Query Stuck in Statistics State
                            
                                Cannot resolve table name close to
                            
                                Hive padding leading zeroes
                            
                                SQL update records with ROW_NUMBER()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are the pros and cons of using multi column primary keys?

Tags:

sql

primary-key

composite-primary-key

Curtis Inderwiesche

People also ask