surrogate vs natural key: hard numbers on performance differences?

Tags:

There's a healthy debate out there between surrogate and natural keys:

SO Post 1

SO Post 2

My opinion, which seems to be in line with the majority (it's a slim majority), is that you should use surrogate keys unless a natural key is completely obvious and guaranteed not to change. Then you should enforce uniqueness on the natural key. Which means surrogate keys almost all of the time.

Example of the two approaches, starting with a Company table:

1: Surrogate key: Table has an ID field which is the PK (and an identity). Company names are required to be unique by state, so there's a unique constraint there.

2: Natural key: Table uses CompanyName and State as the PK -- satisfies both the PK and uniqueness.

Let's say that the Company PK is used in 10 other tables. My hypothesis, with no numbers to back it up, is that the surrogate key approach would be much faster here.

The only convincing argument I've seen for natural key is for a many to many table that uses the two foreign keys as a natural key. I think in that case it makes sense. But you can get into trouble if you need to refactor; that's out of scope of this post I think.

Has anyone seen an article that compares performance differences on a set of tables that use surrogate keys vs. the same set of tables using natural keys? Looking around on SO and Google hasn't yielded anything worthwhile, just a lot of theorycrafting.

Important Update: I've started building a set of test tables that answer this question. It looks like this:

PartNatural - parts table that uses the unique PartNumber as a PK
PartSurrogate - parts table that uses an ID (int, identity) as PK and has a unique index on the PartNumber
Plant - ID (int, identity) as PK
Engineer - ID (int, identity) as PK

Every part is joined to a plant and every instance of a part at a plant is joined to an engineer. If anyone has an issue with this testbed, now's the time.

488

asked Aug 04 '09 18:08

jcollum

1 Answers

Use both! Natural Keys prevent database corruption (inconsistency might be a better word). When the "right" natural key, (to eliminate duplicate rows) would perform badly because of length, or number of columns involved, for performance purposes, a surrogate key can be added as well to be used as foreign keys in other tables instead of the natural key... But the natural key should remain as an alternate key or unique index to prevent data corruption and enforce database consistency...

Much of the hoohah (in the "debate" on this issue), may be due to what is a false assumption - that you have to use the Primary Key for joins and Foreign Keys in other tables. THIS IS FALSE. You can use ANY key as the target for foreign keys in other tables. It can be the Primary Key, an alternate Key, or any unique index or unique constraint., as long as it is unique in the target relation (table). And as for joins, you can use anything at all for a join condition, it doesn't even have to be a key, or an index, or even unique !! (although if it is not unique you will get multiple rows in the Cartesian product it creates). You can even create a join using non-specific criterion (like >, <, or "like" as the join condition.

Indeed, you can create a join using any valid SQL expression that evaluate to a boolean.

128

answered Sep 22 '22 14:09

Charles Bretana

Related questions
                            
                                SQL grant execute on multiple objects
                            
                                Which database should I use with node.js? [closed]
                            
                                How can I upload a DB to Heroku
                            
                                How to replace last occurrence of a substring in MYSQL?
                            
                                java.sql.SQLException: Io exception: Connection reset by peer: socket write error
                            
                                Best way to store a base64 encoded value in MySQL DB?
                            
                                Update sql date field in mssqlserver with YYYY-MM-DD format
                            
                                Return a set instead of list with hibernate Criteria
                            
                                Couldn't read row 0, col -1 from CursorWindow?
                            
                                sync or updateExistingPivot with Laravel -- How to fill based on a 3rd critria
                            
                                Is PostgreSQL multi-row insertion all or nothing?
                            
                                How to create a primary key consists of two fields in Django?
                            
                                Mysql Query across servers without using Federated Table
                            
                                Laravel Query Builder wherebetween
                            
                                How to create mysql database with sequelize (nodejs)
                            
                                Where does Big Data go and how is it stored?
                            
                                Postgres: convert single row to multiple rows (unpivot)
                            
                                What is the timeframe for pg_stat_statements
                            
                                Multiple databases vs a single database
                            
                                How to do fuzzy string search without a heavy database?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

surrogate vs natural key: hard numbers on performance differences?

Tags:

database

key

primary-key

database-design

database-performance

jcollum

People also ask

1 Answers

Charles Bretana

Recent Activity

Donate For Us