Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross reference Facts and Dimensions in Data warehouse

I am trying to design a data warehouse for a licensing vendor, who sells licenses on ecommerce and various other venues. The things they want to track are sales, product lifecycle and activity. What this means is that there are different sale types (such as new purchase, promotional purchase, renewal) and different events/states of a license, such as - a license can get installed, renewed, activated, registered. A license can get renewed many times (on different dates).

So I was thinking my dimensions would be very simple - date, product, source, saletype and event/state. I would have two fact tables; one would be for sales, and another would be for the events, both of them having foreign keys to the dimension tables. My fact tables would be an accumulating fact table, where every event would add a new row - hence, the licenses can be repeated. However, the requirements states that they be able to cross reference these two facts and the saletype and event dimensions. For example, If someone sees that product 'A' has 100 sales in the US ecommerce store of type 'new purchase', then they want to see how many of 'those' 100 licenses also got activated... and then maybe they would want to see, out of the people that activated, how many have registered... and then (back to saletype) of how many of those that registered, how many of them 'renewed'. And I cannot really define a heirarchy, because you could have a whole lot of combinations of these....

How can I do this? As I'm reading, I find there seems to be no way to relate the two facts based on the license itself (which is what I need to do).

Also, I was also thinking that maybe I can have 1 fact table, and I can 'technically' combine the saletype and the eventtype into a big eventtype dimension. So, then in the fact table would be a big transaction fact table, which will have an eventid foreign key to the events dimension. But still, so now I have a fact table, with a row for every event that happens to a license. The license is repeated, and one event can appear for an event more than once (on different dates). So, if I choose to see the totals for an event, how can I see how many of those licenses also exist for a different event?

I need to provide all these numbers as measures, so that a business user can see them on the fly (using whatever OLAP browser they want to use)

note: I am using SQL server analysis services and SQL server 2008 r2

Just as a reference, this is what I have now:

  1. DimProducts (PK: ProductID, and other attributes)
  2. DimDate (PK: DateKey, and other attributes)
  3. DimEvent (PK: EventID, and oither attributes)

  4. FactLicenses(FK: ProductID; FK: DateKey; FK: EventID, and License Field(varchar))

So I have a license repeated, with an event for every time something happens to the license (installed, activated, renewed, cancelled, renewed (again). It is possible there is one license with the same eventID, but never on the same DateKey. The primary key of the table is DateKey + EventID + License

EDIT:

So, I've read in many places that the fact table in a situation like this should be an accumulating fact table, which has multiple columns pointing to the same (type) of dimension - (i.e. date) and that I should create role playing dimension for each one of those. But How do you account for the fact that a license can get renewed multiple times, and can get installed multiple times, etc...?

like image 236
M.R. Avatar asked Nov 14 '22 00:11

M.R.


1 Answers

I've since gone back to Ralph Kimball's book, and found a case study that can solve this issue for me. I've also merged the sale type and event types into one major group. So given that, there are still two groups of things - things that can happen to a license once, vs things that can happen to a license multiple times. Everything that can happen to a license once is now stored in an accumulating fact table. Everything that can happen to a licene multiple times is then stored in a different table (a different table for each entity or 'type' of event that can happen).

This effectively solved the problem for me, because in analysis services, I am now able to make something called 'referenced' relationship, where the relationship is the 'license'. So any of my dimensions that are related to the different table can be linked via the original accumulating fact table (that has the license column).

Thanks for your input, whoever has tried to answer.

like image 83
M.R. Avatar answered Dec 10 '22 06:12

M.R.