I have a question about SQL Server indexes. I'm not a DBA and assume the answer is clear for those of you that are. I am using SQL Server 2008.
I have a table that is similar to the following (but has more columns):
CREATE TABLE [dbo].[Results](
[ResultID] [int] IDENTITY(1,1) NOT NULL,
[TypeID] [int] NOT NULL,
[ItemID] [int] NOT NULL,
[QueryTime] [datetime] NOT NULL,
[ResultTypeID] [int] NOT NULL,
[QueryDay] AS (datepart(day,[querytime])) PERSISTED,
[QueryMonth] AS (datepart(month,[querytime])) PERSISTED,
[QueryYear] AS (datepart(year,[querytime])) PERSISTED,
CONSTRAINT [PK_Results] PRIMARY KEY CLUSTERED
(
[ResultID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
The important fields to notice here are ResultID, the primary key, and QueryTime the datetime at which the result was produced.
I also have the following index (amongst others):
CREATE NONCLUSTERED INDEX [IDX_ResultDate] ON [dbo].[Results]
(
[QueryTime] ASC
)
INCLUDE ( [ResultID],
[ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
In a database where I have about a million rows in the table, the index is used when doing a query such as:
select top 1 * from results where querytime>'2009-05-01' order by ResultID asc
In another instance of the same database, with 50 million rows, SQL Server decides not to use the index as it rather does a Clustered Index Scan which ends up being horribly slow. (and speed depends on the date). Even if I use query hints to make it use IDX_ResultDate, it is still a bit slow and it spends 94% of its time sorting by ResultID. I figured that by creating an index with both ResultID and QueryTime as sorted columns in the index, I could speed up my query.
I therefore created the following:
CREATE NONCLUSTERED INDEX [IDX_ResultDate2] ON [dbo].[Results]
(
[QueryTime] ASC,
[ResultID] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
I assumed that it would first use the sort by QueryTime to find the matching results, which would already be sorted by ResultID. However, this is not the case as this index changes nothing in performance over the existing one.
I then tried the following index:
CREATE NONCLUSTERED INDEX [IDX_ResultDate3] ON [dbo].[Results]
(
[ResultID] ASC,
[QueryTime] ASC
)
INCLUDE ( [ItemID],
[TypeID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
This one produces the intended result. It appears to return in constant time (a fraction of a second).
However, I am puzzled at why IDX_ResultDate3 works well whereas IDX_ResultDate2 doesn't.
I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).
Side question: Should I create a persisted column with the date portion of QueryTime and index on that instead (I already have three persisted columns as you can see above)?
we solved it by creating index on datetime column . yes , cluster index will give better performance over non cluster indexes because ultimately non cluster index use cluster indexes internally .
The ordering of a two-column index is therefore like the ordering of a telephone directory: it is first sorted by surname, then by first name. That means that a two-column index does not support searching on the second column alone; that would be like searching a telephone directory by first name.
Selectivity of the individual columns in a composite index does not matter when picking the order.
When creating an index, you have the option to specify whether the index is created in ascending or descending order. This can be done simply by using the key word ASC or DESC when creating the index as follows.
I would assume that a binary search in as sorted list of QueryTime followed by peeking at the first result in it's child list of ResultIDs is the fastest way at getting the result. (Hence my initial sort order).
That would be fast indeed, but your query expresses a different request: You are asking for the Result with the minimal ResultId from all the queries that occurred after '2009-05-01'. To satisfy the request it has to seek at the beginning of the range ('2009-05-01'), start a scan from this position to extract all the ResultId, sort them then return the top 1 (the minimum ResultId). The second index you added [idx_ResultDate2] doesn't help much either. The query has to do pretty much exactly the same seek and scan: the ResultIds are sorted whithin a result date, so to find out the top ResultId from all results that are after '2009-05-01' the query still has to scan the index till end.
On your last index, [IDX_ResultDate3], the query is cheating. What it does it starts a scan on the inde and its looking at the QueryTime value, knowing that in this index scan the first Result that has a QueryTime in the desired range (> '2009-05-01') is the one you want (because the ResultId is guaranteed to be the Top 1). You get the result in a 'fraction of a second' out of pure luck: you have a matching Result in the beginning of the index. The query may well scan the entire index and match the very lat Result. You can insert a new Result with a QueryTime like '2010-01-01' and then seek for it, you'll see that the performance degrades as the query has to scan the entire index till end (still faster than a table scan because of the narrower index size).
My question is: are you absolutely positive that your query has to return TOP 1 in ORDER BY ResultID? Or you just picked the order by arbitrarily? If you can change the ORDER BY request to, say, QueryTime, then any of the index (updated: with QueryTime as leftmost column) will return a simple Seek and Fetch, no scansn and no sorting.
You have a ranged filtering condition on one field along with ORDER BY
another field.
An index, even a composite index, cannot be used to serve both conditions in this case.
When you create an index on (queryTime, resultId)
, the index is used for filtering. The engine still needs to order the resultset.
When you create an index on (resultId, queryTime)
, the index is used for ordering.
Since you need a TOP 1
result and the row that satisifes this result happens to be in the beginning of the index, the latter approach turns out to perform better.
If your filtering condition would be selective (i. e. it would return few rows), and the first result you need would happen to be in the end of the index, the first approarch would be better.
See this article in my blog for some more explanations and hints on which index to create in which conditions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With