In Scaling Up Your Data Warehouse with SQL Server 2008 R2, the author recommends using an integer date key in the format of YYYYMMDD as a clustered index on your fact tables to help optimize query speed.
What is the best way to convert your key date field to the Date Key? I feel the following would work, but is a bit sloppy:
select Replace(CONVERT(varchar,GETDATE(),102),'.','')
Clearly, I'm not using getdate, but rather a date column in the table that will be using in my aggregations.
First, how would you suggest making this conversion? Is my idea acceptable?
Second, has anyone had much success using the Date Key as a clustered index?
Assuming you are looking for the "time" analogy to the "date" portion of your code which takes YYYYMMDD and turns it into an INT , you can: start with the HH:mm:ss format given by the style number 108. remove the colons to get that string into HHmmss. then convert that to INT.
strftime() object. In this method, we are using strftime() function of datetime class which converts it into the string which can be converted to an integer using the int() function. Returns : It returns the string representation of the date or time object.
In our SQL Server data warehouse, our date or time primary keys are integers, which are also FK's into our Date and Time dimensions. For example: DateKey: 20130401 --> 1st April 2013. TimeKey: 5 --> 00:00:06. Sometimes we need to combine these back into a date or datetime, for example for calculating time differences.
ISO long (112) would do the trick:
SELECT CONVERT(INT, CONVERT(VARCHAR(8), GETDATE(), 112))
Casting getdate() straight to int with ISO 112 gives 41008 for some reason, but going via a VARCHAR seems to work - i'll update if i think of a faster cast.
EDIT: In regards to the int only vs varchar debate, here are my findings (repeatable on my test rig & production server) Varchar method uses less cpu time for half a million casts but a fraction slower overall - negligible unless your dealing with billions of rows
EDIT 2: Revised test case to clear cache and differnt dates
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT YEAR([Date]) * 10000 + MONTH([Date]) * 100 + DAY([Date])
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 218 ms, elapsed time = 255ms.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
SET STATISTICS TIME ON;
WITH RawDates ( [Date] )
AS ( SELECT TOP 500000
DATEADD(DAY, N, GETDATE())
FROM TALLY
)
SELECT CONVERT(INT, CONVERT(VARCHAR(8), [Date], 112))
FROM RawDates
SET STATISTICS TIME OFF
(500000 row(s) affected)
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 602ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With