Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it good practice to have foreign keys in a datawarehouse (relationships)?

I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.

like image 385
Lieven Cardoen Avatar asked Apr 22 '10 12:04

Lieven Cardoen


People also ask

Are foreign keys a good practice?

Foreign keys are the way to use two different tables from the same schema, together. They allow you to merge two tables by using join paths that can be set in multiple different ways using the Chartio data analytics product, including the option to use foreign keys to join two tables from different schemas together.

Does every relationship need a foreign key?

Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. Every relationship in the model must be supported by a foreign key.

Why are foreign keys important in relational databases?

Foreign keys put the “relational” in “relational database” – they help define the relationship between tables. They allow developers to maintain referential integrity across their database.

What are foreign keys good for?

A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table.


2 Answers

I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.

Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.

So, most DWs do not have foreign keys implemented as constraints.

like image 113
Damir Sudarevic Avatar answered Jan 06 '23 01:01

Damir Sudarevic


FK constraints work well in Kimball dimensional models on SQL Server.

Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).

Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.

It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.

like image 36
Cade Roux Avatar answered Jan 06 '23 00:01

Cade Roux