I am creating a data model for customer invoices in a large data warehouse.
The following shows the fields on a typical invoice:
The following is the data model I worked out so far to model the invoices:
Conventional wisdom is that a large data warehouse should use a star schema, which means one fact table, but it seems that to model an invoice I would need two fact tables, as shown above. Would it be correct to use two fact tables?
I recommend you avoid multiple grain fact tables where possible.
Since Invoice Fact contains Total Shipping
and Total Tax
, to boil this down to Invoice Detail Fact, there are two basic options that I can think of:
Create Tax and Freight columns in your Invoice Detail fact and distribute amongst your items. This Kimball Tip suggests exactly that: http://www.kimballgroup.com/2001/07/01/design-tip-25-designing-dimensional-models-for-parent-child-applications/.
An alternative approach which has worked well for me is to create two new members in your product dimension. One for tax and one for freight. Then add these two line items to the fact just like a normal product with appropriate values.
When you analyse by Invoice ID you get the total including Tax & Freight. When you analyse by individual product you don't get a misleading Freight or Tax figure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With