SQL Server Index Order (datetime field)

Tags:

I have a question about SQL Server indexes. I'm not a DBA and assume the answer is clear for those of you that are. I am using SQL Server 2008.

I have a table that is similar to the following (but has more columns):

CREATE TABLE [dbo].[Results](
    [ResultID] [int] IDENTITY(1,1) NOT NULL,
    [TypeID] [int] NOT NULL,
    [ItemID] [int] NOT NULL,
    [QueryTime] [datetime] NOT NULL,
    [ResultTypeID] [int] NOT NULL,
    [QueryDay]  AS (datepart(day,[querytime])) PERSISTED,
    [QueryMonth]  AS (datepart(month,[querytime])) PERSISTED,
    [QueryYear]  AS (datepart(year,[querytime])) PERSISTED,
 CONSTRAINT [PK_Results] PRIMARY KEY CLUSTERED 
(
    [ResultID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

The important fields to notice here are ResultID, the primary key, and QueryTime the datetime at which the result was produced.

I also have the following index (amongst others):

CREATE NONCLUSTERED INDEX [IDX_ResultDate] ON [dbo].[Results] 
(
    [QueryTime] ASC
)
INCLUDE ( [ResultID],
[ItemID],
[TypeID]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]

In a database where I have about a million rows in the table, the index is used when doing a query such as:

select top 1 * from results where querytime>'2009-05-01' order by ResultID asc

In another instance of the same database, with 50 million rows, SQL Server decides not to use the index as it rather does a Clustered Index Scan which ends up being horribly slow. (and speed depends on the date). Even if I use query hints to make it use IDX_ResultDate, it is still a bit slow and it spends 94% of its time sorting by ResultID. I figured that by creating an index with both ResultID and QueryTime as sorted columns in the index, I could speed up my query.

I therefore created the following:

CREATE NONCLUSTERED INDEX [IDX_ResultDate2] ON [dbo].[Results] 
(
[QueryTime] ASC,    
[ResultID] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

I assumed that it would first use the sort by QueryTime to find the matching results, which would already be sorted by ResultID. However, this is not the case as this index changes nothing in performance over the existing one.

I then tried the following index:

CREATE NONCLUSTERED INDEX [IDX_ResultDate3] ON [dbo].[Results] 
(
    [ResultID] ASC,
    [QueryTime] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

This one produces the intended result. It appears to return in constant time (a fraction of a second).

However, I am puzzled at why IDX_ResultDate3 works well whereas IDX_ResultDate2 doesn't.

I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).

Side question: Should I create a persisted column with the date portion of QueryTime and index on that instead (I already have three persisted columns as you can see above)?

812

asked Jul 09 '09 17:07

Jason Kealey

2 Answers

I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).

That would be fast indeed, but your query expresses a different request: You are asking for the Result with the minimal ResultId from all the queries that occurred after '2009-05-01'. To satisfy the request it has to seek at the beginning of the range ('2009-05-01'), start a scan from this position to extract all the ResultId, sort them then return the top 1 (the minimum ResultId). The second index you added [idx_ResultDate2] doesn't help much either. The query has to do pretty much exactly the same seek and scan: the ResultIds are sorted whithin a result date, so to find out the top ResultId from all results that are after '2009-05-01' the query still has to scan the index till end.

On your last index, [IDX_ResultDate3], the query is cheating. What it does it starts a scan on the inde and its looking at the QueryTime value, knowing that in this index scan the first Result that has a QueryTime in the desired range (> '2009-05-01') is the one you want (because the ResultId is guaranteed to be the Top 1). You get the result in a 'fraction of a second' out of pure luck: you have a matching Result in the beginning of the index. The query may well scan the entire index and match the very lat Result. You can insert a new Result with a QueryTime like '2010-01-01' and then seek for it, you'll see that the performance degrades as the query has to scan the entire index till end (still faster than a table scan because of the narrower index size).

My question is: are you absolutely positive that your query has to return TOP 1 in ORDER BY ResultID? Or you just picked the order by arbitrarily? If you can change the ORDER BY request to, say, QueryTime, then any of the index (updated: with QueryTime as leftmost column) will return a simple Seek and Fetch, no scansn and no sorting.

133

answered Oct 19 '22 23:10

Remus Rusanu

You have a ranged filtering condition on one field along with ORDER BY another field.

An index, even a composite index, cannot be used to serve both conditions in this case.

When you create an index on (queryTime, resultId), the index is used for filtering. The engine still needs to order the resultset.

When you create an index on (resultId, queryTime), the index is used for ordering.

Since you need a TOP 1 result and the row that satisifes this result happens to be in the beginning of the index, the latter approach turns out to perform better.

If your filtering condition would be selective (i. e. it would return few rows), and the first result you need would happen to be in the end of the index, the first approarch would be better.

See this article in my blog for some more explanations and hints on which index to create in which conditions:

Choosing index

answered Oct 19 '22 23:10

Quassnoi

Related questions
                            
                                Restore database SQL server query
                            
                                Could not allocate space for object dbo.table
                            
                                Creating XML with SQL Server
                            
                                Incorrect syntax near the keyword 'case'
                            
                                SQL SERVER Agent stops working Automatically
                            
                                Error while importing bacpak in Sql Server 2012
                            
                                SqlBulkCopy into table that Default column values fails when source DataTable row has DBNull.Value
                            
                                Where IN a Comma delimited string [duplicate]
                            
                                Perform a Case insensitive Like query in a case sensitive SQL Server database
                            
                                HAVING clause: at least one of the ungrouped values is X
                            
                                Random Datetime In given Datetime Range
                            
                                Transactions in unit of work design pattern
                            
                                Subtract two dates in Microsoft SQL Server
                            
                                Creating a Connection String and Working with SQL Server LocalDB
                            
                                Determine if a supplier supplies all products in a list
                            
                                Stored Procedure Versioning
                            
                                What happens to a SQL Server 2005 table with an identity (int) that goes over maxint?
                            
                                Which versions of SQL Server does LINQ to SQL support?
                            
                                Database deployment: script or backup
                            
                                Are SQL stored procedures case sensitive?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL Server Index Order (datetime field)

Tags:

database

sql-server

indexing

Jason Kealey

People also ask

2 Answers

Remus Rusanu

Quassnoi

Recent Activity

Donate For Us