Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

date/time dimension

I am designing a data warehouse and I have a sticky issue with time. The grain I need is hourly (to calculate aggregate counts of events per hour) and I also have to accommodate a shift pattern that does not conveniently fit inside a 24 hour period (in fact it is possible that 'blue' shift wont cover the same time of day for several days).

With this in mind I am contemplating one of 3 approaches

  1. a single time dimension with 175K rows in it.
  2. a snowflake time dimension with 7300 rows in a calendar dimension and 175k rows in a time dimension
  3. separate dimensions so that the fact table has foreign keys for the event date and for the event time.

I am tending towards approach 3 as it allows the small calendar dimension to be referenced separately in joins, but I would appreciate any thoughts.

like image 210
dfoster99 Avatar asked Feb 08 '11 07:02

dfoster99


People also ask

What is a time dimension data?

The Date Dimension table has attributes that include day, date, weeks, quarter, months, fiscal period, national holiday indicators, quarter, and year. The Time Dimension table has attributes that include hour formats, minute of the day, seconds of hour, quarter hour, and am or pm.

Is a date a dimension?

"Date" is a common dimension, with several possible hierarchies: "Days (are grouped into) Months (which are grouped into) Years", "Days (are grouped into) Weeks (which are grouped into) Years" "Days (are grouped into) Months (which are grouped into) Quarters (which are grouped into) Years"

Why is date a dimension?

A date dimension is an essential table in a data model that allows us to analyze performance more effectively across different time periods. It should be included in every dimensional model that contains a date or requires date intelligence as part of the analysis.


1 Answers

Yes, manufacturing shifts are tricky and do change over time, often one shift starts day before, etc.

Keep in mind that there are two calendars here. One is the standard calendar and the other is the production calendar -- the shift belongs to the production calendar. In general, a day in production calendar may last more (or less) than 24 hours.

For example:

Part produced on Monday, 2011-02-07 23:45 may look like

TimeOfProduction = '2011-02-07 23:45'
DateKey = 20110207
TimeKey = 2345
ProductionDateKey = 20110208 (the first shift of the next day started at 22:00)
ProductionTimeKey = 145 (1 hour and 45 minutes of the current production date)     
ShiftKey = 1
ShiftTimeKey = 145 (1 hour and 45 minutes of the current shift)

So, my suggestion is:

  1. Plain Date Dimension (one row per date)
  2. Plain Time Dimension (one row per minute for 24 hours = 1440 rows + see note below)
  3. Shift Dimension -- type 2 dimension with rw_ValidFrom, (rw_ValidTo) , rw_IsCurrent
  4. Role-play the DateKey into ProductionDateKey
  5. Role-play the TimeKey into a ProductionTimeKey and ShiftTimeKey.
  6. Keep the TimeOfProduction (datetime) in the fact table too.
  7. During the ETL process, apply the current shift logic to attach ProductionDateKey, ProductionTimeKey, ShiftKey, ShiftTimeKey to each row of the factPart table.

Note that you may need to add extra rows to the Time Dimension if a production day can last more than 24 hours. It usually can if a local time is used and there is a daylight savings time jump.

So, the star may look something like this

enter image description here

like image 126
Damir Sudarevic Avatar answered Sep 28 '22 05:09

Damir Sudarevic