I need to examine 30 million records (one month worth) of daily ticket validations (Unix datetime) and divy them up into 24 one-hour periods for 211 stations.
First I created a view which selects the month I’m looking for (and the equipment type) then creates a Windows Datetime value for each Unix datetime.
SELECT TOP (100) PERCENT StationName AS Station, MainTable.UnixDateTime AS ValTime,
DATEADD(s, MainTable.UnixDateTime, CONVERT(DATETIME, '1970-01-01 00:00:00', 102)) AS WinTime
FROM MainTable
INNER JOIN StationName ON MainTable.StationID = StationName.StationID
WHERE (StationName.ValidStationCode = 32767) [use only valid stations]
AND (MainTable. UnixDateTime >= 1264996800)
AND (MainTable. UnixDateTime < 1267416000)
AND (MainTable.EquipmentCode IN (33, 36)) [examine only this equipment]
ORDER BY Station
I then run the main procedure which uses a select statement on this view for each of the one-hour periods. That’s 24 select statements for each of 211 stations.
Example 1)
Update table Set [0102]= (select count(ValTime)
from view
where Station = @thisStation and DatePart (Hour, WinTime)>= 1 and DatePart (Hour, WinTime)< 2)
from view
where table.Station = @thisStation
Example 2)
Update table Set [0203]= (select count(ValTime)
from view
where Station = @thisStation and DatePart (Hour, WinTime)>= 2 and DatePart (Hour, WinTime)< 3)
from view
where table.Station = @thisStation
etc.
The procedure works - Yay! It takes 11 hours - Boo :(
Output e.g.:
0001 0102 0203 0304...2324
Station1 27 34 567 231... 123
Station2 245 57 23 198... 21
etc.
The output table has 25 columns.
There is a faster, better way of doing this but I don't know what it is. (Buy a supercomputer?) I've thought about Rank and partition but can't visualize how that would work. Any gurus out there?
The DatePart function is probably murdering your performance. I think it would speed things up greatly to break the date into day, hour, minute columns. Add indexes to those columns and you can query off literal values rather than computed values.
Unix times are stored as integers. I think you will get better performance if you handle them as integers instead of converting to DateTime. For example, to get the hour from a unix date time, you can do this....
Select (UnixDateTime % 86400) / 3600 As Hour
I'm very curious to see what the performance would be like if you ran this query:
Select StationName As Station,
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 0 Then MainTable.UnixDateTime End) As [0001],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 1 Then MainTable.UnixDateTime End) As [0102],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 2 Then MainTable.UnixDateTime End) As [0203],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 3 Then MainTable.UnixDateTime End) As [0304],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 4 Then MainTable.UnixDateTime End) As [0405],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 5 Then MainTable.UnixDateTime End) As [0506],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 6 Then MainTable.UnixDateTime End) As [0607],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 7 Then MainTable.UnixDateTime End) As [0708],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 8 Then MainTable.UnixDateTime End) As [0809],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 9 Then MainTable.UnixDateTime End) As [0910],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 10 Then MainTable.UnixDateTime End) As [1011],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 11 Then MainTable.UnixDateTime End) As [1112],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 12 Then MainTable.UnixDateTime End) As [1213],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 13 Then MainTable.UnixDateTime End) As [1314],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 14 Then MainTable.UnixDateTime End) As [1415],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 15 Then MainTable.UnixDateTime End) As [1516],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 16 Then MainTable.UnixDateTime End) As [1617],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 17 Then MainTable.UnixDateTime End) As [1718],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 18 Then MainTable.UnixDateTime End) As [1819],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 19 Then MainTable.UnixDateTime End) As [1920],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 20 Then MainTable.UnixDateTime End) As [2021],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 21 Then MainTable.UnixDateTime End) As [2122],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 22 Then MainTable.UnixDateTime End) As [2223],
Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 23 Then MainTable.UnixDateTime End) As [2324]
FROM MainTable
INNER JOIN StationName ON MainTable.StationID = StationName.StationID
WHERE (StationName.ValidStationCode = 32767) --[use only valid stations]
AND (MainTable.UnixDateTime >= 1264996800)
AND (MainTable.UnixDateTime < 1267416000)
AND (MainTable.EquipmentCode IN (33, 36)) --[examine only this equipment]
Group By StationName
ORDER BY Station
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With