Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a fact table in normalized or de-normalized form?

I did a bit R&D on the fact tables, whether they are normalized or de-normalized. I came across some findings which make me confused.

According to Kimball:

Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized. While we advise against a fully normalized with snowflaked dimension attributes in separate tables (creating blizzard-like conditions for the business user), a single denormalized big wide table containing both metrics and descriptions in the same table is also ill-advised.

The other finding, which I also I think is ok, by fazalhp at GeekInterview:

The main funda of DW is de-normalizing the data for faster access by the reporting tool...so if ur building a DW ..90% it has to be de-normalized and off course the fact table has to be de normalized...

So my question is, are fact tables normalized or de-normalized? If any of these then how & why?

like image 656
Aditya Avatar asked Mar 28 '14 04:03

Aditya


People also ask

What is the difference between normalized and denormalized table structures?

Dimensional models combine normalized and denormalized table structures. The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized.

What is data normalization in database?

Data normalization means reorganizing data so that it contains no redundant data, and all related data items are stored together, with related data separated into multiple tables. Normalizing data ensures the database takes up minimal disk space while response times are maximized.

Can a fact table be denormalized?

But yes there can be situations where fact tables are populated from numerous tables and they contain a lot of columns. In that case you can say that fact table is denormalized, but it would be great to say that the schema is denormalized than just the fact table.

What is the difference between a fact table and dimension table?

The dimension tables of descriptive information are highly denormalized with detailed and hierarchical roll-up attributes in the same table. Meanwhile, the fact tables with performance metrics are typically normalized.


2 Answers

From the point of relational database design theory, dimension tables are usually in 2NF and fact tables anywhere between 2NF and 6NF.

However, dimensional modelling is a methodology unto itself, tailored to:

  • one use case, namely reporting

  • mostly one basic type (pattern) of a query

  • one main user category -- business analyst, or similar

  • row-store RDBMS like Oracle, SQl Server, Postgres ...

  • one independently controlled load/update process (ETL); all other clients are read-only

There are other DW design methodologies out there, like

  • Inmon's -- data structure driven

  • Data Vault -- data structure driven

  • Anchor modelling -- schema evolution driven

The main thing is not to mix-up database design theory with specific design methodology. You may look at a certain methodology through database design theory perspective, but have to study each methodology separately.

like image 139
Damir Sudarevic Avatar answered Sep 22 '22 04:09

Damir Sudarevic


Most people working with a data warehouse are familiar with transactional RDBMS and apply various levels of normalization, so those concepts are used to describe working a star schema. What they're doing is trying to get you to unlearn all those normalization habits. This can get confusing because there is a tendency to focus on what "not" to do.

The fact table(s) will probably be the most normalized since they usually contain just numerical values along with various id's for linking to dimensions. They key with fact tables is how granular do you need to get with your data. An example for Purchases could be specific line items by product in an order or aggregated at a daily, weekly, monthly level.

My suggestion is to keep searching and studying how to design a warehouse based on your needs. Don't look to get to high levels of normalized forms. Think more about the reports you want to generate and the analysis capabilities to give your users.

like image 36
JeffO Avatar answered Sep 21 '22 04:09

JeffO