Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create history fact table?

I have some entities in my Data Warehouse:

  1. Person - with attributes personId, dateFrom, dateTo, and others those can be changed, e.g. last name, birth date and so on - slowly changing dimension

  2. Document - documentId, number, type

  3. Address - addressId, city, street, house, flat

The relations between (Person and Document) is One-To-Many and (Person and Address) is Many-To-Many.

My target is to create history fact table that can answer us following questions:

  1. What persons with what documents lived at defined address on defined date?

2, What history of residents does defined address have on defined interval of time?

This is not only for what DW is designed, but I think it is the hardest thing in DW's design.

For example, Miss Brown with personId=1, documents with documentId=1 and documentId=2 had been lived at address with addressId=1 since 01/01/2005 to 02/02/2010 and then moved to addressId=2 where has been lived since 02/03/2010 to current date (NULL?). But she had changed last name to Mrs Green since 04/05/2006 and her first document with documentId=1 to documentId=3 since 06/07/2007. Mr Black with personId=2, documentId=4 has been lived at addressId=1 since 02/03/2010 to current date.

The expected result on our query for question 2 where addressId=1, and time interval is since 01/01/2000 to now, must be like:

Rows:

last_name="Brown", documentId=1, dateFrom=01/01/2005, dateTo=04/04/2006

last_name="Brown", documentId=2, dateFrom=01/01/2005, dateTo=04/04/2006

last_name="Green", documentId=1, dateFrom=04/05/2006, dateTo=06/06/2007

last_name="Green", documentId=2, dateFrom=04/05/2006, dateTo=06/06/2007

last_name="Green", documentId=2, dateFrom=06/07/2007, dateTo=02/01/2010

last_name="Green", documentId=3, dateFrom=06/07/2007, dateTo=02/01/2010

last_name="Black", documentId=4, dateFrom=02/03/2010, dateTo=NULL

I had an idea to create fact table with composite key (personId, documentId, addressId, dateFrom) but I have no idea how to load this table and then get that expected result with this structure.

I will be pleased for any help!

like image 998
Argnist Avatar asked May 21 '11 07:05

Argnist


People also ask

What is example of fact table?

A Fact Table is one that holds the primary keys of the referenced dimension tables along with some quantitative metrics (i.e. measurements) over which some sort of calculation can be performed. Some common examples of facts tables include orders, logs and time-series financial data.


1 Answers

Interesting question @Argnist!

So to create some common language for my example, you want a

  • DimPerson (PK=kcPerson, suggorate key for unique Persons=kPerson, type 2 dim)
  • DimDocument (PK=kcDocument, suggorate key for unique Documents=kDocument, type 2 dim)
  • DimAddress (PK=kcAddress, suggorate key for unique Addresses=kAddress, type 2 dim)

A colleague has written a short blog on the usage of two surrogate keys to explain the above dims 'Using Two Surrogate Keys on Dimensions'.

I would always add DimDate with PK in the form yyyymmdd to any data warehouse with extra attribute columns.

Then you would have your fact table as

  • FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDate) plus any aditional measures.

Then joining on the "kc"s you can show the current Person/Document/Address dimension information. If you joined on the "k"s you can show the historic Person/Document/Address dimension information.

The downside of this is that this fact table needs one row for each person/document/address/date combination. But it really is a very narrow table, since the table just has a number of foreign keys.

The advantage of this is it is very easy to query for the sorts of questions you were asking.

Alternatively, you could have your fact table as

  • FactHistory (FKs=kcPerson, kPerson, kcDocument, kDocument, kcPerson, kPerson, kDateFrom, kDateTo) plus any aditional measures.

This is obviously much more compact, but the querying becomes more complex. You could also put a view over the Fact table to make it easier to query!

The choice of solution depends on the frequency of change of the data. I suspect that it will not be changing that quickly, so teh alternate design of the fact table may be better.

Hope that helps.

like image 62
Marcus D Avatar answered Sep 21 '22 04:09

Marcus D