Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a star schema, are foreign key constraints between facts and dimensions neccessary?

I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star schema. In traditional applications I’m used to having them, but I started to wonder if they were needed in this case. I’m currently working in a SQL Server 2005 environment.

UPDATE: For those interested I came across a poll asking the same question.

like image 898
Garett Avatar asked May 12 '10 13:05

Garett


People also ask

What is the type of relationship in star schema between dimension and fact?

A star schema can have any number of dimension tables. The branches at the end of the links connecting the tables indicate a many-to-one relationship between the fact table and each dimension table.

Can dimension table have foreign keys?

Dimension tables have a surrogate ID column that is the primary key of that dimension. A fact table may use these dimension surrogate IDs as foreign keys to the dimension table.

What are the roles of fact and dimension tables in the star schema?

A fact table contains dimension key columns that relate to dimension tables, and numeric measure columns. The dimension key columns determine the dimensionality of a fact table, while the dimension key values determine the granularity of a fact table.

What is the difference between the fact table in a star schema and in a snowflake schema?

Data redundancyStar schema stores redundant data in dimension tables, while snowflake schema fully normalizes dimension tables and avoids data redundancy.


2 Answers

Most data-warehouses (DW) do not have foreign keys implemented as constraints, because:

  • In general, foreign key constraint would trigger on: an insert into a fact table, any key-updates, and a delete from a dimension table.

  • During loading, indexes and constraints are dropped to speed-up the loading process, data integrity is enforced by the ETL application.

  • Once tables are loaded, DW is essentially read-only -- the constraint does not trigger on reads.

  • Any required indexes are re-built after the loading.

  • Deleting in a DW is a controlled process. Before deleting rows from dimensions, fact tables are queried for keys of rows to be deleted -- deleting is allowed only if those keys do not exists in any of fact tables.

Just in case, it is common to periodically run queries to detect orphan records in fact tables.

like image 137
Damir Sudarevic Avatar answered Nov 12 '22 07:11

Damir Sudarevic


We use them, and we're happy with it.

Is it good practice to have foreign keys in a datawarehouse (relationships)?

There is overhead, but you can always disable the constraint during load and then re-enable it.

Having the constraint in place can catch ETL bugs and modelling defects.

like image 33
Cade Roux Avatar answered Nov 12 '22 08:11

Cade Roux