Merge Facts from Different Sources? Or Load Separately?

Question

We've got data with two different origins: some comes from a customer, some comes from different vendors. Currently, we physically "merge" this data into a massive table with almost a hundred columns, tens of thousands of rows and no formal separation of the two dimensions. Consequently, we can't actually use this table for much.

I'm going to redesign this mess into a proper, but small, star schema.

The two dimensions are obvious. One of them, for example, is time.

The customer-supplied data provides a number of fact values. Each vendor may (or may not) provide additional fact values that fit the same dimensions.

This fact data all has the same granularity. It can be called "sparse" because we don't often get information from all vendors.

Here's my dilemma.

Is this one fact table -- with some nulls -- populated from different sources?

Or is this n+1 fact tables -- one populated from the customer, the others populated from each vendor?

There are pros and cons to each design. I need some second opinions on the choice between "merge" or "load separately".

Customer supplies revenue, cost, counts, weights, and other things they know about their end of a transaction.

Vendor one supplies some additional details about some of the transactions -- weights, costs, durations. The other transactions will have no value from vendor one.

Vendor two supplies some additional details about some of the transactions -- volumes, durations, lengths, foreign currency rates. The other transactions will have no value for vendor two.

Some transactions will have both vendors. A few transactions will have neither vendor.

One table with nulls? Three tables?

Bell · Accepted Answer

I'd go for the single fact table. The highlight pro of this approach is that it leaves all the hard work at load time rather than at query time.

Bob Probst · Answer

From what you describe, it sounds like a single fact table is the way to go.

It sounds like the fact table would have a grain of time x transaction x customer(?).

My prior question was really trying to find out if some of the vendor data was a candidate for its own dimension. I'll leave it to you to determine that. but it doesn't really sound like it.

Null facts can throw warnings during aggregations (depending on the platform) but the alternative of populating them with possibly misleading zeros is worse.

Merge Facts from Different Sources? Or Load Separately?

Tags:

database-design

data-warehouse

S.Lott

2 Answers

Bell

Bob Probst

Recent Activity

Donate For Us

Merge Facts from Different Sources? Or Load Separately?

Tags:

database-design

data-warehouse

S.Lott

2 Answers

Bell

Bob Probst

Related questions

Recent Activity

Donate For Us