A question:
I have 2 tables:
Product
id INT
name VARCHAR(64)
something TEXT
else INT
entirely BOOL
and
Ingredient
id INT
name VARCHAR(64)
description TEXT
Now I also have a link table
Products_Ingredients
product_id INT
ingredient_id INT
for my many to many relation.
Now both products and ingredients will have unique names. So I can use names as natural keys... however will that be a good idea?
Say I have a product: Paint Thinner Supreme
with ingredient: Butylonitrotetrocycline
Will that be a good idea to use those names as composite key in the link table?
As much as I understand idea behind using natural keys over the surrogates, I kinda can't stop thinking that using simple integers as primary keys (and foreign ones) will be much faster. Will there be a difference in a way in which MySQL server digests those different keys?
What is your opinion?
In order to add a row with a given foreign key value, there must exist a row in the related table with the same primary key value. Surrogate keys join the dimension tables to the fact table. Surrogate keys serve as an important means of identifying each instance or entity inside of a dimension table.
A natural key might require several fields to accomplish a unique identity for each record. A surrogate key is unique in and of itself.
Natural key: an attribute that can uniquely identify a row, and exists in the real world. Surrogate key: an attribute that can uniquely identify a row, and does not exist in the real world. Composite key: more than one attribute that when combined can uniquely identify a row.
A surrogate key is a unique key for an entity in the client's business or for an object in the database. Sometimes natural keys cannot be used to create a unique primary key of the table. This is when the data modeler or architect decides to use surrogate or helping keys for a table in the LDM.
Opinions don't matter when you can measure.
I implemented this on PostgreSQL using both natural keys and surrogates. I used 300,000 total products, 180 ingredients, and populated two "product ingredient" tables with 3 to 17 ingredients per product, for 100,000 randomly selected products (1053462 rows).
Selecting all the ingredients for a single product using natural keys returned in 0.067 ms. Using surrogates, 0.199ms.
Returning all the non-id columns for a single product using natural keys returned in 0.145 ms. Using surrogates, 0.222 ms
So natural keys were about 2 to 3 times faster on this data set.
Natural keys don't require any joins to return this data. Surrogate keys require two joins.
The actual performance difference depends on the width of your tables, number of rows, page size, and length of names, and things like that. There will be a point where surrogate keys start outperforming natural keys, but few people try to measure that.
When I was designing the database for my employer's operational database, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.
We expect natural keys to outperform surrogates in our database for years to come. (Unless we move certain tables over to an SSD, in which case natural keys will probably outperform surrogates forever.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With