Given the following star schema tables.
# geog_abb time_date amount value
#1: AL 2013-03-26 55.57 9113.3898
#2: CO 2011-06-28 19.25 9846.6468
#3: MI 2012-05-15 94.87 4762.5398
#4: SC 2013-01-22 29.84 649.7681
#5: ND 2014-12-03 37.05 6419.0224
# geog_abb geog_name geog_division_name geog_region_name
#1: AK Alaska Pacific West
#2: AL Alabama East South Central South
#3: AR Arkansas West South Central South
#4: AZ Arizona Mountain West
#5: CA California Pacific West
# time_date time_weekday time_week time_month time_month_name time_quarter time_quarter_name time_year
#1: 2010-01-01 Friday 1 1 January 1 Q1 2010
#2: 2010-01-02 Saturday 1 1 January 1 Q1 2010
#3: 2010-01-03 Sunday 1 1 January 1 Q1 2010
#4: 2010-01-04 Monday 1 1 January 1 Q1 2010
#5: 2010-01-05 Tuesday 1 1 January 1 Q1 2010
Examples is stripped of surrogate keys to improve readability. In results there are levels in hierarchy without other attributes, just don't bother that, they are still levels in hierarchy.
GEOGRAPHY (all fields)
/
/
FACT
\
\
TIME (all fields)
geog_region_name
/
geog_division_name
/
geog_abb (+ geog_name)
/
/
FACT
\
\
time_date
|
hierarchies: |
weekly / \ monthly
/ \
/ \
time_weekday time_month (+ time_month_name)
| |
| |
time_week time_quarter (+ time_quarter_name)
| |
| |
time_year time_year
Does it have any specific name? Starflake? :)
|>-- geog_region_name
|
|>-- geog_division_name
|
|>-- geog_abb (+ geog_name)
|
|
geography base
/
/
FACT
\
\
time base
|
|
|>-- time_date
|
|>-- time_weekday
|
|>-- time_week
|
|>-- time_month (+ time_month_name)
|
|>-- time_quarter (+ time_quarter_name)
|
|>-- time_year
It basically has a dimension base table storing identities of every level of every hierarchy within a dimension. No need for recursive walk through snowflake's levels, potentially less joins. Data still well normalized, only keys are denormalized into base table. All levels from all hierarchies tied to lowest grain key of a dimension in dimension base.
Additionally having a dimension base table allows to handle time variant attributes/temporal queries just in that table, at the granularity of a hierarchy level.
Still on natural keys!
# geog_abb time_date amount value
# 1: AK 2010-01-01 154.43 12395.472
# 2: AK 2010-01-02 88.89 6257.639
# 3: AK 2010-01-03 81.74 7193.075
# 4: AK 2010-01-04 165.87 11150.619
# 5: AK 2010-01-05 8.75 6953.055
# time_date time_year time_quarter time_month time_week time_weekday
# 1: 2010-01-01 2010 1 1 1 Friday
# 2: 2010-01-02 2010 1 1 1 Saturday
# 3: 2010-01-03 2010 1 1 1 Sunday
# 4: 2010-01-04 2010 1 1 1 Monday
# 5: 2010-01-05 2010 1 1 1 Tuesday
# time_year
# 1: 2010
# 2: 2011
# 3: 2012
# 4: 2013
# 5: 2014
# time_quarter time_quarter_name
# 1: 1 Q1
# 2: 2 Q2
# 3: 3 Q3
# 4: 4 Q4
# time_month time_month_name
# 1: 1 January
# 2: 2 February
# 3: 3 March
# 4: 4 April
# 5: 5 May
# time_week
# 1: 1
# 2: 2
# 3: 3
# 4: 4
# 5: 5
# time_weekday
# 1: Friday
# 2: Monday
# 3: Saturday
# 4: Sunday
# 5: Thursday
# time_date time_week time_weekday time_year
# 1: 2010-01-01 1 Friday 2010
# 2: 2010-01-02 1 Saturday 2010
# 3: 2010-01-03 1 Sunday 2010
# 4: 2010-01-04 1 Monday 2010
# 5: 2010-01-05 1 Tuesday 2010
# geog_abb geog_region_name geog_division_name
# 1: AK West Pacific
# 2: AL South East South Central
# 3: AR South West South Central
# 4: AZ West Mountain
# 5: CA West Pacific
# geog_region_name
# 1: North Central
# 2: Northeast
# 3: South
# 4: West
# geog_division_name
# 1: East North Central
# 2: East South Central
# 3: Middle Atlantic
# 4: Mountain
# 5: New England
# geog_abb geog_name geog_division_name geog_region_name
# 1: AK Alaska Pacific West
# 2: AL Alabama East South Central South
# 3: AR Arkansas West South Central South
# 4: AZ Arizona Mountain West
# 5: CA California Pacific West
Dimension base could store also primary key's attributes, this would de-duplicate dimension's lowest level but will be less consistent (time_date
levels from both hierarchies would fit into time dimension base tables).
What drawbacks such schema would have? I don't much bother about speed of joins and aggregates, and a query tool adaptivity.
Does it have any name? It is being use? If not why?
Star schemas denormalize the data, which means adding redundant columns to some dimension tables to make querying and working with the data faster and easier.
In a simple star schema, the primary key for the fact table consists of one or more foreign keys. A foreign key is a column or group of columns in one table whose values are defined by the primary key in another table.
Query performance. Dimensional models are more denormalized and optimized for data querying, while normalized models seek to eliminate data redundancies and are optimized for transaction loading and updating.
There is no normalization in the Star Schema. In the Snowflake Schema, there is both normalization and denormalization. It has a very simple design.
You are building a snowflake schema with shortcuts.
It's used and BI tools can easily use the shortcuts.
You can also have shortcuts from a parent level of a dimension to a fact table at child level for that dimension. It works, you can skip a join, but you need to store an additional column in the fact table.
The only concern is about data integrity, if a parent-child relationship changes you need to update not only the child table, but also all other tables where this relationship is stored.
It's not a big deal if you generate every time your dimension table from your normalize data, but you need to be careful, even more if you store a parent ID in the fact table.
What you are doing is not a snowflake schema ...it is similar to "Data Vault" and our own variation "Link-Model". It essentially creates link tables just containing keys which sit between Fact tables and Dim tables (and other Dim tables). Although, we describe them as entity tables and measure tables.
The advantages are
The downsides are
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With