Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composite vs Surrogate keys for Referential Integrity in 6NF

Tags:

sql

database

6nf

Take three layers of information:

Layer 1: Information

This layer contains data with UNIQUE natural indexes and a surrogate key that is easily transferrable.

Table Surnames:

+-----------------------------+--------------+
|    ID (Auto Increment, PK)  |    Surname   |
+-----------------------------+--------------+
|               1             |     Smith    |
|               2             |    Edwards   |
|               3             |     Brown    |
+-----------------------------+--------------+

Table FirstNames

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   FirstName  |
+-----------------------------+--------------+
|               1             |     John     |
|               2             |     Bob      |
|               3             |     Mary     |
|               4             |     Kate     |
+-----------------------------+--------------+

Natural Keys

Alternatively, the two tables above can be without ID and utilize Surname and FirstName as Natural Primary Keys, as explained by Mike Sherrill. In this instance, assume the layer below references varchar rather than int.

Layer 2: People

In this layer a composite index is used. This value can be UNIQUE or PRIMARY, depending on whether a surrogate key is utilized as the Primary Key.

+-----------------+--------------+
|    FirstName    |    LastName  |
+-----------------+--------------+
|        1        |       2      |
|        1        |       3      |
|        2        |       3      |
|        3        |       1      |
|        4        |       2      |
|       ...       |      ...     |
+-----------------+--------------+

Layer 3: Parents

In this layer, relationships between people are explored through a ParentsOf table.

ParentsOf

+-----------------+-----------------+
|      Person     |   PersonParent  |
+-----------------+-----------------+

 OR

+-----------------+-----------------+-----------------+-----------------+
| PersonFirstName |  PersonSurname  | ParentFirstName |  ParentSurname  |
+-----------------+-----------------+-----------------+-----------------+

The Question

Assuming that referential integrity is VERY important to me at its very core, and I will have FOREIGN KEYS on these indexes so that I keep the database responsible for monitoring its own integrity on this front, and that, if I were to use an ORM, it would be one like Doctrine which has native support for Compound Primary Keys...

Please help me to understand:

  • The list of trade-offs that take place with utilizing surrogate keys vs. natural keys on the 1st Layer.

  • The list of trade-offs that take place with utilizing compound keys vs. surrogate keys on the 2nd Layer which can be transferred over to the 3rd Layer.

I am not interested in hearing which is better, because I understand that there are significant disagreements among professionals on this topic and it would be sparking a religious war. Instead, I am asking, very simply and as objectively as is humanly possible, what trade-offs will you be taking by passing surrogate keys to each Layer vs maintaining Primary keys (natural/composite, or surrogate/composite). Anyone will be able to find someone saying NEVER or ALWAYS use surrogate keys on SO and other websites. Instead, a reasoned analyses of trade-offs is what I will most appreciate in your answers.

EDIT: It has been pointed out that a surname example is a poor example for a use of 6NF. For the sake of keeping the question intact, I am going to leave it be. If you are having trouble imagining the use case for this, a better one might be a list of "Grocery Items". AKA:

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Grocery    |
+-----------------------------+--------------+
|               1             | Sponges      |
|               2             | Tomato Soup  |
|               3             | Ice Cream    |
|               4             | Lemons       |
|               5             | Strawberries |
|               6             | Whipped Cream|
+-----------------------------+--------------+

+-----------------------------+--------------+
|   ID (Auto Increment, PK)   |   Brand      |
+-----------------------------+--------------+
|               1             | Bright       |
|               2             | Ben & Jerry's|
|               3             | Store Brand  |
|               4             | Campbell's   |
|               5             | Cool Whip    |
+-----------------------------+--------------+    

Natural Composite Key Example:

+-----------------------------+--------------+
|           Grocery           |   Brand      |
+-----------------------------+--------------+
|           Sponges           | Bright       |
|           Ice Cream         | Ben & Jerry's|
|           Ice Cream         | Store Brand  |
|           Tomato Soup       | Campbell's   |
|           Tomato Soup       | Store Brand  |
|           Lemons            | Store Brand  |
|           Whipped Cream     | Cool Whip    |
+-----------------------------+--------------+ 

Recommended Pairings

+-----------------+-----------------+-----------------+-----------------+
|     Grocery1     |  Brand1        | Grocery2        |  Brand2         |
+-----------------+-----------------+-----------------+-----------------+

To reiterate, this is also just an example. This is not how I would recommend proceeding, but it should help to illustrate my question.

There ARE shortfalls to this method. I'll reiterate that this question was to request walking through the benefits and drawbacks of each method below, not to highlight one as better than another. I believe most people were able to look past the questionable nature of this specific example to answer the core question. This edit is for those that cannot.

There are some very good answers below and if you are curious about which direction to go, please read them.

END EDIT

Thank you!

like image 617
smcjones Avatar asked May 24 '14 22:05

smcjones


People also ask

What is surrogate key and composite key?

A composite key is simply a key that contains two or more columns. A surrogate key is one which is not naturally unique to the data but is added or imposed onto the data for (hopefully) a good reason. Examples can include IDENTITY columns in SQL Server - known as autonumbers sometimes in other products.

Which is better a natural key or a surrogate key explain your reasoning?

1: A primary key value must be unique A primary key uniquely identifies each record within a table and relates records to additional data stored in other tables. A natural key might require several fields to accomplish a unique identity for each record. A surrogate key is unique in and of itself.

What is the difference between natural keys Composite keys and surrogate keys?

Natural key: an attribute that can uniquely identify a row, and exists in the real world. Surrogate key: an attribute that can uniquely identify a row, and does not exist in the real world. Composite key: more than one attribute that when combined can uniquely identify a row.


1 Answers

Here's some trade-offs:

Single Surrogate (artificially created):

  • All child tables foreign keys only need a single column to reference the primary key.

  • Very easy to update the natural keys in table without needing to update every child table with foreign keys

  • Smaller primary/foreign key indexes (ie. not a wide) This can make the database run faster, for example when a record is deleted in a parent table, the child tables need to be searched to make sure this will not create orphans. Narrow indexes are faster to scan (just sightly).

  • you will have more indexes because you most likely will also want to index whatever natural keys exists in the data.

Natural composite keyed tables:

  • fewer indexes in the database

  • less columns in the database

  • easier/faster to insert a ton of records as you will not need to grab the sequence generator

  • updating one of the keys in the compound requires that every child table also be updated.

Then there is another category: artificial composite primary keys

I've only found one instance where this makes sense. When you need to tag every record in every table for row level security.

For example, suppose you had an database which stored data for 50,000 clients and each client was not supposed to see other client's data--very common in web application development.

If each record was tagged with a client_id field, you are creating a row level security environment. Most databases have the tools to enforce row level security when setup correctly.

First thing to do is setup primary and foreign keys. Normally a table with have an id field as the primary key. By adding client_id the key is now composite key. And it is necessary to carry client_id to all child table.

The composite key is based on 2 surrogate keys and is a bulletproof way to ensure data integrity among clients and within the database a whole.

After this you would create views (or if using Oracle EE setup Virtual Private Database) and other various structures to allow the database to enforce row level security (which is a topic all it own).

Granted that this data structure is no longer normalized to the nth degree. The client_id field in each pk/fk denormalizes an otherwise normal model. The benefit of the model is the ease of enforcing row level security at the database level (which is what databases should do). Every select, insert, update, delete is restricted to whatever client_id your session is currently set. The database has session awareness.

Summary

Surrogate keys are always the safe bet. They require a little more work to setup and require more storage.

The biggest benefit in my opinion is:

  • Being able to update the PK in one table and all other child tables are instantaneously changed without ever being touched.

  • When data gets messed up--and it will at some point due to a programming mistake, surrogate keys make the clean up much much easier and in some cases only possible to do because there are surrogate keys.

  • Query performance is improved as the db is able to search attributes to locate the s.key and then join all child table by a single numeric key.

Natural Keys especially composite NKeys make writing code a pain. When you need to join 4 tables the "where clause" will be much longer (and easier to mess up) than when single SKeys were used.

Surrogate keys are the "safe" route. Natural keys are beneficial in a few places, I'd say around 1% of the tables in a db.

like image 155
Brian McGinity Avatar answered Sep 30 '22 10:09

Brian McGinity