Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

30 million records. Divy them up into 24 hourly-periods. Sum them for one month. Rinse and Repeat

I need to examine 30 million records (one month worth) of daily ticket validations (Unix datetime) and divy them up into 24 one-hour periods for 211 stations.

First I created a view which selects the month I’m looking for (and the equipment type) then creates a Windows Datetime value for each Unix datetime.

SELECT TOP (100) PERCENT StationName AS Station, MainTable.UnixDateTime AS ValTime, 
DATEADD(s, MainTable.UnixDateTime, CONVERT(DATETIME, '1970-01-01 00:00:00', 102)) AS WinTime
FROM  MainTable 
INNER JOIN StationName ON MainTable.StationID = StationName.StationID
WHERE (StationName.ValidStationCode = 32767) [use only valid stations] 
AND (MainTable. UnixDateTime >= 1264996800) 
AND (MainTable. UnixDateTime < 1267416000) 
AND (MainTable.EquipmentCode IN (33, 36)) [examine only this equipment]
ORDER BY Station

I then run the main procedure which uses a select statement on this view for each of the one-hour periods. That’s 24 select statements for each of 211 stations.

Example 1)

Update table Set [0102]= (select count(ValTime) 
from view 
where Station = @thisStation and DatePart (Hour, WinTime)>= 1 and DatePart (Hour, WinTime)< 2)
from view 
where table.Station = @thisStation

Example 2)

Update table Set [0203]= (select count(ValTime) 
from view 
where Station = @thisStation and DatePart (Hour, WinTime)>= 2 and DatePart (Hour, WinTime)< 3) 
from view 
where table.Station = @thisStation

etc.

The procedure works - Yay! It takes 11 hours - Boo :(

Output e.g.:

         0001 0102 0203 0304...2324
Station1   27  34   567  231... 123
Station2  245  57   23   198... 21
etc.

The output table has 25 columns.

There is a faster, better way of doing this but I don't know what it is. (Buy a supercomputer?) I've thought about Rank and partition but can't visualize how that would work. Any gurus out there?

like image 842
Ron Van Wegen Avatar asked Mar 04 '10 01:03

Ron Van Wegen


2 Answers

The DatePart function is probably murdering your performance. I think it would speed things up greatly to break the date into day, hour, minute columns. Add indexes to those columns and you can query off literal values rather than computed values.

like image 135
Dave Swersky Avatar answered Nov 14 '22 10:11

Dave Swersky


Unix times are stored as integers. I think you will get better performance if you handle them as integers instead of converting to DateTime. For example, to get the hour from a unix date time, you can do this....

Select (UnixDateTime % 86400) / 3600 As Hour

I'm very curious to see what the performance would be like if you ran this query:

Select  StationName As Station,
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 0 Then MainTable.UnixDateTime End) As [0001],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 1 Then MainTable.UnixDateTime End) As [0102],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 2 Then MainTable.UnixDateTime End) As [0203],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 3 Then MainTable.UnixDateTime End) As [0304],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 4 Then MainTable.UnixDateTime End) As [0405],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 5 Then MainTable.UnixDateTime End) As [0506],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 6 Then MainTable.UnixDateTime End) As [0607],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 7 Then MainTable.UnixDateTime End) As [0708],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 8 Then MainTable.UnixDateTime End) As [0809],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 9 Then MainTable.UnixDateTime End) As [0910],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 10 Then MainTable.UnixDateTime End) As [1011],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 11 Then MainTable.UnixDateTime End) As [1112],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 12 Then MainTable.UnixDateTime End) As [1213],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 13 Then MainTable.UnixDateTime End) As [1314],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 14 Then MainTable.UnixDateTime End) As [1415],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 15 Then MainTable.UnixDateTime End) As [1516],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 16 Then MainTable.UnixDateTime End) As [1617],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 17 Then MainTable.UnixDateTime End) As [1718],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 18 Then MainTable.UnixDateTime End) As [1819],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 19 Then MainTable.UnixDateTime End) As [1920],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 20 Then MainTable.UnixDateTime End) As [2021],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 21 Then MainTable.UnixDateTime End) As [2122],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 22 Then MainTable.UnixDateTime End) As [2223],
        Count(Case When (MainTable.UnixDateTime % 86400) / 3600 = 23 Then MainTable.UnixDateTime End) As [2324]
FROM  MainTable 
INNER JOIN StationName ON MainTable.StationID = StationName.StationID
WHERE (StationName.ValidStationCode = 32767) --[use only valid stations] 
AND (MainTable.UnixDateTime >= 1264996800) 
AND (MainTable.UnixDateTime < 1267416000) 
AND (MainTable.EquipmentCode IN (33, 36)) --[examine only this equipment]
Group By StationName
ORDER BY Station
like image 3
George Mastros Avatar answered Nov 14 '22 12:11

George Mastros