Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What should I do to get an Clustered Index Seek instead of Clustered Index Scan?

I've got a Stored Procedure in SQL Server 2005 and when I run it and I look at its Execution Plan I notice it's doing a Clustered Index Scan, and this is costing it the 84%. I've read that I've got to modify some things to get a Clustered Index Seek there, but I don't know what to modify.

I'll appreciate any help with this.

Thanks,

Brian

like image 477
Brian Roisentul Avatar asked Aug 31 '09 21:08

Brian Roisentul


People also ask

What is the difference between clustered index seek and scan?

Index scan means it retrieves all the rows from the table and index seek means it retrieves selective rows from the table.

Which is better index scan or index seek?

Therefore, it might be suitable for small tables, but retrieving all rows from that table is not optimal for SQL Server if it has a massive number of records. The Index Seek refers only to the qualified rows and pages, i.e., it is selective in nature. Therefore, the Index seek is faster compared to Index scans.

Is Clustered index scan better than table scan?

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100 ), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

In which scenario scan is better than seek?

Note however that in certain situations an index scan can be faster than an index seek (sometimes significantly faster) - usually when the table is very small, or when a large percentage of the records match the predicate.


2 Answers

W/o any detail is hard to guess what the problem is, and even whether is a problem at all. The choice of a scan instead of a seek could be driven by many factors:

  • The query expresses a result set that covers the entire table. Ie. the query is a simple SELECT * FROM <table>. This is a trivial case that would be perfectly covered by a clustred index scan with no need to consider anything else.
  • The optimizer has no alternatives:
    • the query expresses a subset of the entire table, but the filtering predicate is on columns that are not part of the clustered key and there are no non-clustred indexes on those columns either. These is no alternate plan other than a full scan.
    • The query has filtering predicates on columns in the clustred index key, but they are not SARGable. The filtering predicate usually needs to be rewritten to make it SARGable, the proper rewrite depends from case to case. A more subtle problem can appear due to implicit conversion rules, eg. the filtering predicate is WHERE column = @value but column is VARCHAR (Ascii) and @value is NVARCHAR (Unicode).
    • The query has SARGale filtering predicates on columns in the clustered key, but the leftmost column is not filtered. Ie. clustred index is on columns (foo, bar) but the WHERE clause is on bar alone.
  • The optimizer chooses a scan.
    • When the alternative is a non-clustered index then scan (or range seek) but the choice is a to use the clustered index the cause can be usually tracked down to the index tipping point due to lack of non-clustered index coverage for the query projection. Note that this is not your question, since you expect a clustered index seek, not a non-clustred index seek (assumming the question is 100% accurate and documented...)
    • Cardinality estimates. The query cost estimate is based on the clustered index key(s) statistics which provide an estimate of the cardinality of the result (ie. how many rows will match). On a simple query This cannot happen, as any estimate for a seek or range seek will be lower than the one for a scan, no matter how off the statistics are, but on a complex query, with joins and filters on multiple tables, things are more complex and the plan may include a scan where a seek was expected because the query optimizer may choose plan on which the join evaluation order is reversed to what the observer expects. The reverse order choice may e correct (most times) or may be problematic (usually due to statistics being obsolete or to parameter sniffing).
    • An ordering guarantee. A scan will produce results in a guaranteed order and elements higher on the execution tree may benefit from this order (eg. a sort or spool may be eliminated, or a merge join can be used instead of hash/nested joins). Overall the query cost is better as a result of choosing an apparently slower access path.

These are some quick pointers why a clustered index scan may be present when a clustered index seek is expected. The question is extremly generic and is impossible to give an answer 'why', other than relying on an 8 ball. Now if I take your question to be properly documented and correctly articulated, then to expect a clustered index seek it means you are searching an unique record based on a clustred key value. In this case the problem has to be with the SARGability of the WHERE clause.

like image 113
Remus Rusanu Avatar answered Oct 13 '22 22:10

Remus Rusanu


If the Query incldues more than a certain percentage of the rows in the table, the optimizer will elect to do a scan instead of a seek, because it predicts that it will require fewer disk IOs in that case (For a Seek, It needs one Disk IO per level in the index for each row it returns), whereas for a scan there is only one disk IO per row in the entire table.

So if there are, say 5 levels in the b-tree Index, then if the query will generate more than 20% of the rows in the table, it is cheaper to read the whole table than make 5 IOs for each of the 20% rows...

Can you narrow the output of the query a bit more, to reduce the number of rows returned by this step in the process? That would help it choose the seek over the scan.

like image 26
Charles Bretana Avatar answered Oct 13 '22 23:10

Charles Bretana