This isn't exactly what I am doing but I feel this is a good example:
Let's say I have a Product dimension table that connects to my ProductSales Fact table. Each row in dimProduct holds all the relevant data for a single product (code, name, description etc) and there are around a million products.
I now have a requirement to store the product categories into the warehouse. Each Product has multiple categories, averaging at 5.
Am I supposed to duplicate entire rows in the Product Dimension for each category the product fits into or am I supposed to snowflake my current star schema with a dimCategory dimension and dimProductCategory link table between the two?
I'm afraid that if I do the former then my Dimension table will become over 5 times bigger and if I do the latter then the model will become far more complex.
A “Multi Valued Dimension” is a dimension with more than 1 value per fact row. As always it is best to explain by example: DimCustomer.
A common approach for handling multivalued dimensions is to introduce a bridge table. The following figure shows a bridge table to associate multiple customers with an account. In this case, the bridge contains one row for each customer associated with an account.
A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators.
Well, for a new-comer your question is rather insightful!
If each of your product can be categorized into multiple catagories (and each product category contains n number of products), then the cardinality between Product
and Product Category
is many-to-many. When you have many-to-many cardinality, direct Snow Flaking is not the solution.
But I think what you mean by snowflaking here is the use of a link table between Category
and Product
. And in my opinion, that is the currect approach. But I would rather call this table as a Factless fact table.
Snowflaking is the solution for a one-to-many cardinality problem (e.g. 1 category contains multiple products). To resolve the many-to-many cardinality, you will need Factless Fact table that stores the keys from both category
Product
tables.
Remember, in case your transactional data which you load to your ProductSales
fact table, already contains both Category
and Product
details, you might as well want to include both the Category ID and Product ID in your ProductSales
fact table. You do this when you need not maintain any fixed relation between products and categories but rather the relationship is driven from the incidents that occur in actual business.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With