I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star schema. In traditional applications I’m used to having them, but I started to wonder if they were needed in this case. I’m currently working in a SQL Server 2005 environment.
UPDATE: For those interested I came across a poll asking the same question.
A star schema can have any number of dimension tables. The branches at the end of the links connecting the tables indicate a many-to-one relationship between the fact table and each dimension table.
Dimension tables have a surrogate ID column that is the primary key of that dimension. A fact table may use these dimension surrogate IDs as foreign keys to the dimension table.
A fact table contains dimension key columns that relate to dimension tables, and numeric measure columns. The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table.
Data redundancyStar schema stores redundant data in dimension tables, while snowflake schema fully normalizes dimension tables and avoids data redundancy.
Most data-warehouses (DW) do not have foreign keys implemented as constraints, because:
In general, foreign key constraint would trigger on: an insert into a fact table, any key-updates, and a delete from a dimension table.
During loading, indexes and constraints are dropped to speed-up the loading process, data integrity is enforced by the ETL application.
Once tables are loaded, DW is essentially read-only -- the constraint does not trigger on reads.
Any required indexes are re-built after the loading.
Deleting in a DW is a controlled process. Before deleting rows from dimensions, fact tables are queried for keys of rows to be deleted -- deleting is allowed only if those keys do not exists in any of fact tables.
Just in case, it is common to periodically run queries to detect orphan records in fact tables.
We use them, and we're happy with it.
Is it good practice to have foreign keys in a datawarehouse (relationships)?
There is overhead, but you can always disable the constraint during load and then re-enable it.
Having the constraint in place can catch ETL bugs and modelling defects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With