I'm finding that, on an indexed view with appropriate indexes, MAX(date) is performing an entire index scan followed by a stream aggregate whereas TOP (1) date is optimally using the index and only scanning a single row. For large numbers of rows this is leading to serious performance issues. I've included some code to demonstrate the issue below but would be interested to know if others can explain why this behaviour is occuring (it does not occur on a table with similar index) and whether it is a bug in SQL Server's optimiser (I've tested on both 2008 SP2 and on R2, and both show the same issues).
CREATE TABLE dbo.TableWithDate
(
id INT IDENTITY(1,1) PRIMARY KEY,
theDate DATE NOT NULL
);
CREATE NONCLUSTERED INDEX [ix_date] ON dbo.TableWithDate([theDate] DESC);
INSERT INTO dbo.TableWithDate(theDate) VALUES('1 MAR 2010'),('1 MAR 2010'), ('3 JUN 2008');
-- Test 1: max vs top(1) on the table. They give same optimal plan (scan one row from the index, since index is in order)
SELECT TOP(1) theDate FROM dbo.TableWithDate ORDER BY theDate DESC;
SELECT MAX(theDate) FROM dbo.TableWithDate;
CREATE TABLE dbo.TheJoinTable
(
identId INT IDENTITY(1,1) PRIMARY KEY,
foreignId INT NOT NULL,
someValue INT NOT NULL
);
CREATE NONCLUSTERED INDEX [ix_foreignValue] ON dbo.TheJoinTable([foreignId] ASC);
INSERT INTO dbo.TheJoinTable(foreignId,someValue) VALUES (1,10),(1,20),(1,30),(2,5),(3,6),(3,10);
GO
CREATE VIEW dbo.TheTablesJoined
WITH SCHEMABINDING
AS
SELECT T2.identId, T1.id, T1.theDate, T2.someValue
FROM dbo.TableWithDate AS T1
INNER JOIN dbo.TheJoinTable AS T2 ON T2.foreignId=T1.id
GO
-- Notice the different plans: the TOP one does a scan of 1 row from each and joins
-- The max one does a scan of the entire index and then does seek operations for each item (less efficient)
SELECT TOP(1) theDate FROM dbo.TheTablesJoined ORDER BY theDate DESC;
SELECT MAX(theDate) FROM dbo.TheTablesJoined;
-- But what about if we put an index on the view? Does that make a difference?
CREATE UNIQUE CLUSTERED INDEX [ix_clust1] ON dbo.TheTablesJoined([identId] ASC);
CREATE NONCLUSTERED INDEX [ix_dateDesc] ON dbo.TheTablesJoined ([theDate] DESC);
-- No!!!! We are still scanning the entire index (look at the actual number of rows) in the MAX case.
SELECT TOP(1) theDate FROM dbo.TheTablesJoined ORDER BY theDate DESC;
SELECT MAX(theDate) FROM dbo.TheTablesJoined;
John Sansom covered the performance characteristics of MAX
vs. TOP
, however his results didn't specifically answer your question.
I think the answer lies in the fact that MAX
is a general purpose aggregate function geared toward crunching pages and pages of data, where TOP
is an operator geared toward only restricting the number of rows being fetched.
In this one narrow use case, both example queries are capable of going after the same thing, and can return the same result. The query using TOP
is benefiting from the specific optimizations afforded by using that method for this use case.
I dumped out the XML plans for both queries, and the statement using MAX
contained:
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="MAX([db].[dbo].[TheTablesJoined].[theDate])">
<Aggregate AggType="MAX" Distinct="false">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[db]" Schema="[dbo]" Table="[TheTablesJoined]" Column="theDate" />
</Identifier>
</ScalarOperator>
</Aggregate>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
The statement using TOP
contained this in place of the XML defining what was being aggregated in the MAX
query:
<TopExpression>
<ScalarOperator ScalarString="(1)">
<Const ConstValue="(1)" />
</ScalarOperator>
</TopExpression>
There's a lot less going on in the execution plan when using TOP
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With