Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dimension row with multiple attributes

This isn't exactly what I am doing but I feel this is a good example:

Let's say I have a Product dimension table that connects to my ProductSales Fact table. Each row in dimProduct holds all the relevant data for a single product (code, name, description etc) and there are around a million products.

I now have a requirement to store the product categories into the warehouse. Each Product has multiple categories, averaging at 5.

Am I supposed to duplicate entire rows in the Product Dimension for each category the product fits into or am I supposed to snowflake my current star schema with a dimCategory dimension and dimProductCategory link table between the two?

I'm afraid that if I do the former then my Dimension table will become over 5 times bigger and if I do the latter then the model will become far more complex.

like image 626
Timothy Jeffreys Avatar asked Jan 20 '14 21:01

Timothy Jeffreys


People also ask

What is multi valued dimensions?

A “Multi Valued Dimension” is a dimension with more than 1 value per fact row. As always it is best to explain by example: DimCustomer.

What are approaches to handle multi valued dimensions?

A common approach for handling multivalued dimensions is to introduce a bridge table. The following figure shows a bridge table to associate multiple customers with an account. In this case, the bridge contains one row for each customer associated with an account.

What are junk dimensions?

A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators.


1 Answers

Well, for a new-comer your question is rather insightful!

If each of your product can be categorized into multiple catagories (and each product category contains n number of products), then the cardinality between Product and Product Category is many-to-many. When you have many-to-many cardinality, direct Snow Flaking is not the solution.

But I think what you mean by snowflaking here is the use of a link table between Category and Product. And in my opinion, that is the currect approach. But I would rather call this table as a Factless fact table.

Snowflaking is the solution for a one-to-many cardinality problem (e.g. 1 category contains multiple products). To resolve the many-to-many cardinality, you will need Factless Fact table that stores the keys from both category Product tables.

Remember, in case your transactional data which you load to your ProductSales fact table, already contains both Category and Product details, you might as well want to include both the Category ID and Product ID in your ProductSales fact table. You do this when you need not maintain any fixed relation between products and categories but rather the relationship is driven from the incidents that occur in actual business.

like image 57
hashbrown Avatar answered Sep 30 '22 15:09

hashbrown