Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time and date dimension in data warehouse

I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables. alt starexample
(source: etl-tools.info)

But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.

What are your opinions/solutions ?

(I'm using Infobright)

like image 271
Piotr Gwiazda Avatar asked Mar 24 '10 11:03

Piotr Gwiazda


People also ask

How is time dimension used in data warehouse?

To fill a Time Dimension table – Right-click on the Time Dimension object and select the Fill Time Dimension Table option. Astera Data Warehouse Builder would automatically fill values in the provided database table.

What is a time dimension data?

One of the major dimensions in every multidimensional data warehouse is the time dimension. The time dimension contains descriptive temporal information, and its attributes are used as the source of most of the temporal constraints in data warehouse queries (Kimball, 1996).

Is date a dimension or fact?

Typically dimensions in a data warehouse are organized internally into one or more hierarchies. "Date" is a common dimension, with several possible hierarchies: "Days (are grouped into) Months (which are grouped into) Years", "Days (are grouped into) Weeks (which are grouped into) Years"


2 Answers

Kimball recommends having separate time- and date dimensions:

design-tip-51-latest-thinking-on-time-dimension-tables

In previous Toolkit books, we have recommended building such a dimension with the minutes or seconds component of time as an offset from midnight of each day, but we have come to realize that the resulting end user applications became too difficult, especially when trying to compute time spans. Also, unlike the calendar day dimension, there are very few descriptive attributes for the specific minute or second within a day. If the enterprise has well defined attributes for time slices within a day, such as shift names, or advertising time slots, an additional time-of-day dimension can be added to the design where this dimension is defined as the number of minutes (or even seconds) past midnight. Thus this time-ofday dimension would either have 1440 records if the grain were minutes or 86,400 records if the grain were seconds.

like image 114
davek Avatar answered Oct 02 '22 12:10

davek


My guess is that it depends on your reporting requirement. If you need need something like

WHERE "Hour" = 10

meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than

WHERE date_part('hour', TimeStamp) = 10  

because the date_part() function will be evaluated for every row. You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:

WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15' 

which gets awkward when using dimension fields.

Usually, time dimension has a minute resolution, so 1440 rows.

like image 23
Damir Sudarevic Avatar answered Oct 02 '22 12:10

Damir Sudarevic