Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server won't use my index

I have a fairly simple query:

SELECT
     col1,
     col2…
FROM
     dbo.My_Table
WHERE
     col1 = @col1 AND
     col2 = @col2 AND
     col3 <= @col3

It was performing horribly, so I added an index on col1, col2, col3 (int, bit, and datetime). When I checked the query plan it was ignoring my index. I tried reordering the columns in the index in every possible configuration and it always ignored the index. When I run the query it does a clustered index scan (table size is between 700K and 800K rows) and takes 10-12 seconds. When I force it to use my index it returns instantly. I was careful to clear the cache and buffers between tests.

Other things I’ve tried:

UPDATE STATISTICS dbo.My_Table

CREATE STATISTICS tmp_stats ON dbo.My_Table (col1, col2, col3) WITH FULLSCAN

Am I missing anything here? I hate to put an index hint in a stored procedure, but SQL Server just can’t seem to get a clue on this one. Anyone know any other things that might prevent SQL Server from recognizing that using the index is a good idea?

EDIT: One of the columns being returned is a TEXT column, so using a covering index or an INCLUDE won't work :(

like image 901
Tom H Avatar asked Jul 09 '09 19:07

Tom H


3 Answers

You have 800k rows indexed by col1, col2, col3. Col2 is a bit, so its selectivity is 50%. Col3 is a checked on a range (<=), so it's selectivity will be roughly at about 50% too. Which leaves col1. The query is compiled for the generic, parametrized plan, so it has to account for the general case. If you have 10 distinct values of col1, then your index will return approximately 800k /10 * 25% that is about ~20k keys to lookup in the clustered index to retrieve the '...' part. If you have 10k distinct col1 values then the index will return just 20 keys to look up. As you can see, what matters is not how you build your index in this case, but the actual data. Based on the selectivity of col1, the optimizer will choose a plan based on a clustered index scan (as better than 20k key lookups, each lookup at a cost of at least 3-5 page reads) or one based on the non-clustered index (if col1 is selective enough). In real life the distribution of col1 also plays a role, but going into that would complicate the explanation too much.

You can come with the benefit of hindsight and claim the plan is wrong, but the plan is the best cost estimate based on the data available at compile time. You can influence it with hints (index hint as you suggests, or optimize for hints as Quassnoi suggests) but then your query may perform better for your test set, and far worse for a different set of data, say for the case when @col1 = <the value that matches 500k records>. You can also make the index covering, thus eliminating the '...' in the projection list that require the clustered index lookup necessary, in which case the non-clustered index is always a better cost match than the clustered scan.

Kimberley Tripp has a blog article covering this subject, she calls it the 'index tipping point' which explains how come an apparently perfect candidate index is being ignored: a non-clustered index that does not cover the projection list and has poor selectivity will be seen as more costly than a clustered scan.

like image 78
Remus Rusanu Avatar answered Nov 11 '22 09:11

Remus Rusanu


SQL Server optimizer is not good in optimizing queries that use variables.

If you are sure that you will always benefit from using the index, just put a hint.

If you will put the literal values to the query instead of variables, it will pick the correct statistics and will use the index.

You may also try to put a more light hint:

OPTION (OPTIMIZE FOR (@col1 = 1, @col2 = 0, @col3 = '2009-07-09'))

, which will calculate the best execution plan for these values of the variables, using statistics, and won't stick to using index no matter what.

like image 2
Quassnoi Avatar answered Nov 11 '22 08:11

Quassnoi


The order of the index is important for this query:

CREATE INDEX MyIndex ON MyTable (col3 DESC, col2 ASC, col1 ASC)

It's not so much the ASC/DESC as that when sql server goes to match that where clause, it can match on col3 first and walk the index along that value.

like image 1
Joel Coehoorn Avatar answered Nov 11 '22 10:11

Joel Coehoorn