Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the advantage of using a date dimension table over directly storing a date?

I have a need to store a fairly large history of data. I have been researching the best ways to store such an archive. It seems that a datawarehouse approach is what I need to tackle. It seems highly recommended to use a date dimension table rather than a date itself. Can anyone please explain to me why a separate table would be better? I don't have a need to summarize any of the data, just access it quickly and efficiently for any give day in the past. I'm sure I'm missing something, but I just can't see how storing the dates in a separate table is any better than just storing a date in my archive.

I have found these enlightening posts, but nothing that quite answers my question.

  • What should I have in mind when building OLAP solution from scratch?
  • Date Table/Dimension Querying and Indexes
  • What is the best way to store historical data in SQL Server 2005/2008?
  • How to create history fact table?
like image 864
RubberDuck Avatar asked Feb 14 '23 00:02

RubberDuck


2 Answers

Well, one advantage is that as a dimension you can store many other attributes of the date in that other table - is it a holiday, is it a weekday, what fiscal quarter is it in, what is the UTC offset for a specific (or multiple) time zone(s), etc. etc. Some of those you could calculate at runtime, but in a lot of cases it's better (or only possible) to pre-calculate.

Another is that if you just store the DATE in the table, you only have one option for indicating a missing date (NULL) or you need to start making up meaningless token dates like 1900-01-01 to mean one thing (missing because you don't know) and 1899-12-31 to mean another (missing because the task is still running, the person is still alive, etc). If you use a dimension, you can have multiple rows that represent specific reasons why the DATE is unknown/missing, without any "magic" values.

Personally, I would prefer to just store a DATE, because it is smaller than an INT (!) and it keeps all kinds of date-related properties, the ability to perform date math etc. If the reason the date is missing is important, I could always add a column to the table to indicate that. But I am answering with someone else's data warehousing hat on.

like image 53
Aaron Bertrand Avatar answered Feb 16 '23 18:02

Aaron Bertrand


Lets say you've got a thousand entries per day for the last year. If you've a date dimension your query grabs the date in the date dimension and then uses the join to collect the one thousand entries you're interested in. If there's no date dimension your query reads all 365 thousand rows to find the one thousand you want. Quicker, more efficient.

like image 30
OTTA Avatar answered Feb 16 '23 17:02

OTTA