Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is datekey in fact tables always INT?

Tags:

I'm looking at the datekey column from the fact tables in AdventureWorksDW and they're all of type int.

Is there a reason for this and not of type date?

I understand that creating a clustered index composed of an INT would optimize query speed. But let's say I want to get data from this past week. I can subtract 6 from date 20170704 and I'll get 20170698 which is not a valid date. So I have to cast everything to date, subtract, and then cast as int.

Right now I have a foreign key constraint to make sure that something besides 'YYYYMMDD' isn't inserted. It wouldn't be necessary with a Date type. Just now, I wanted to get some data between 6/28 and 7/4. I can't just subtract six from `20170703'; I have to cast from int to date.

It seems like a lot of hassle and not many benefits.

Thanks.

like image 391
fdkgfosfskjdlsjdlkfsf Avatar asked Jul 05 '17 19:07

fdkgfosfskjdlsjdlkfsf


1 Answers

Yes, you could be using a Date data type and have that as your primary key in the Fact and the dimension and you're going to save yourself a byte in the process.

And then you're going to have to deal with a sale that is recorded and we didn't know the date. What then? In a "normal" dimensional model, you define Unknown surrogate values so that people know there is data and it might be useful but it's incomplete. A common convention is to make it zero or in the negative realm. Easy to do with integers.

Dates are a little weird in that we typically use smart keys - yyyymmdd. From a debugging perspective, it's easy to quickly identify what the date is without having to look up against your dimension.

You can't make an invalid date. Soooo what then? Everyone "knows" that 1899-12-31 is the "fake" date (or whatever tickles your fancy) and that's all well and good until someone fat fingers a date and magically hit your sentinel date and now you've got valid unknowns mixed with merely bad data entry.

If you're doing date calculations against an smart key, you're doing it wrong. You need to go to your data dimension to properly resolve the value and use methods that are aware of date logic because it's ugly and nasty beyond just simple things like month lengths and leap year calculations.

like image 190
billinkc Avatar answered Oct 11 '22 13:10

billinkc