Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to structure an index for group by in Sql Server

The following simple query takes a very long time (several minutes) to execute.

I have an index:

create index IX on [fctWMAUA] (SourceSystemKey, AsAtDateKey)
SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
FROM [fctWMAUA] (NOLOCK) AS [t0]
WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)
GROUP BY [t0].[SourceSystemKey]

The statistics are as follows:

  • logical reads 1827978
  • physical reads 1113
  • read aheads 1806459

Taking that exact same query and reformatting it as follows gives me these statistics:

  • logical reads 36
  • physical reads 0
  • read aheads 0

It takes 31ms to execute.

SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 1
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 2
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]
/* AND SO ON TO 9 */

How do I make an index that does the group by quickly?

like image 888
Craig Avatar asked Nov 04 '09 11:11

Craig


2 Answers

Its difficult to say without looking at an execution plan, however you might want to try the following:

SELECT * FROM
(
    SELECT MAX(t0.AsAtDateKey) AS [Date], t0.SourceSystemKey AS SourceSystem
    FROM fctWMAUA (NOLOCK) AS t0
    GROUP BY t0.SourceSystemKey
)
WHERE SourceSystem in (1,2,3,4,5,6,7,8,9)

Its difficult to tell without looking at an execution plan, but I think that whats happening is that SQL server is not clever enough to realise that the WHERE clause specified is filtering out the groups, and does not have any effect on the records included for each group. As soon as SQL server realises this its free to use some more inteligent index lookups to work out the maximum values (which is whats happening in your second query)

Just a theory, but it might be worth a try.

like image 194
Justin Avatar answered Nov 15 '22 21:11

Justin


I have found that the best solution is the following. It mimics the union version of the query, and runs very quickly.

40 logical reads, and an execution time of 3ms.

SELECT [t3].[value]
FROM [dimSourceSystem] AS [t0]
OUTER APPLY (
    SELECT MAX([t2].[value]) AS [value]
    FROM (
        SELECT [t1].[AsAtDateKey] AS [value], [t1].[SourceSystemKey]
        FROM [fctWMAUA] AS [t1]
        ) AS [t2]
    WHERE [t2].[SourceSystemKey] = ([t0].[SourceSystemKey])
    ) AS [t3]
like image 33
Craig Avatar answered Nov 15 '22 22:11

Craig