Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a concept of slowly changing FACT in data warehouse

In data warehousing, we have the concept of slowly changing dimensions. I am just wondering why there is no jargon for 'slowly/rapidly changing FACTs' because the same Type1, Type 2 measures can be used to track changes in the FACT table.

like image 888
Victor Avatar asked Feb 06 '14 15:02

Victor


1 Answers

According to the DW gods there are 3 types of FACT tables

  • Transaction: your basic measurements with dim references. measurements not rolled up or summarized, lots of DIM relations
  • Periodic: Rolled up summaries of transaction fact tables over a defined period of time.
  • Accumulating Snapshot: measurements associated with a 2+ defined time periods

From these we have at least 2 options that will result in something pretty similar to a slowly changing fact table. It all depends on how your source system is set up.

Option 1: Transactional based Source System

If your source system tracks changes to measurements via a series of transactions (ie, initial value + change in value + change value etc) then each of these transactions ends up in the transactional fact. This is then used by the periodic fact table to reflect the 'as of period' measures.

For example, if your source system tracks money in and out of an account you would probably have a transaction fact table that pretty much mirrored the source money in/out table. You would also have a periodic fact table that would be updated every period (in this case month) to reflect the total value of the account for that period

The periodic fact table is your Slowly Changing Fact table.

Source               DW_Tansaction_Fact       DW_Periodic_Fact
---------------  ->  -------------------  ->  --------------------
Acnt1 Jan +10$       Acnt1 Jan +10$           Acnt1 Jan 10$
Acnt1 Feb -1 $       Acnt1 Feb -1 $           Acnt1 Feb  9$
Acnt1 Apr +2 $       Acnt1 Apr +2 $           Acnt1 Mar  9$
                                              Acnt1 Apr 11$

Option 2: CRUD/Overwriting Source System

Its more likely you have a source system that lets users directly update/replace the business measurements. At any point in time, according to the source system, there was and is only one value for each measure. You can make this transaction by some clever trickery in your ETL process but your only ever going to get a transaction window limited by your ETL schedule.

In this case you could go either with a Periodic Fact table OR an Accumulating fact table.

Lets stick with our account example, but instead of transactions the table just stores an amount value against each account. This is updated as required in the source system so that for Acnt1, in January it was 10$, February 9$ and April 11$

Sticking the the transaction and period fact tables we would end up with this data (As at end of April). Again, The periodic fact table is your Slowly Changing Fact table.

DW_Tansaction_Fact       DW_Periodic_Fact
 -------------------  ->  --------------------
Acnt1 11$                 Acnt1-Jan-10$
                          Acnt1-Feb-09$
                          Acnt1-Mar-09$
                          Acnt1-Apr-11$

But we could also go with with an Accumulating Fact table which could contain all month values for a given year.

DW_Accumlative_Fact_CrossTab
Year Acnt  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001 Acnt1 10  9   9   11  -   -   -   -   -   -   -   -

Or a more type3-ish version

DW_Accumlative_Fact_CrossTab
Acnt    Year   YearStartVal   CurrentVal
Acnt1   2001   10             9

Kindof relevant

In my experience, this sort of question comes up when this common business scenario:

  1. There is a Core Business System with a DATABASE.
  2. Business Periodically Issues Reports that summaries values by time periods from Core Business System
  3. Core Business System allows retrospective updating of data - This is handled by overwriting values.
  4. Business demands to know why the January figures in the same report run in June no longer match the January figures from the report run in February.

Note that you are now dealing with FOUR sets of time (Initial period of report, measurement at date of initial period, current report period, measurement at current period) which will be hard enough for you to explain let alone your end users to understand.

Try to step back, explain to your end users which business measures change over time, listen to what results they want and build your facts accordingly. Note that you may end up with multiple fact tables for the same measure, that is OK and good.

Reference:

  • http://www.kimballgroup.com/2008/11/fact-tables/
  • http://www.zentut.com/data-warehouse/fact-table/
like image 191
Joe Avatar answered Sep 27 '22 23:09

Joe