Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this an Index Scan and not a Index Seek?

Here's the query:

SELECT      top 100 a.LocationId, b.SearchQuery, b.SearchRank
FROM        dbo.Locations a
INNER JOIN  dbo.LocationCache b ON a.LocationId = b.LocationId
WHERE       a.CountryId = 2
AND         a.Type = 7

Location Indexes:

PK_Locations:

LocationId

IX_Locations_CountryId_Type:

CountryId, Type

LocationCache Indexes:

PK_LocationCache:

LocationId

IX_LocationCache_LocationId_SearchQuery_SearchRank:

LocationId, SearchQuery, SearchRank

Execution Plan:

enter image description here

So it's doing a Index Seek on Locations, using the covering index, cool.

But why it is doing a Index Scan on the LocationCache covering index?

That covering index has LocationId, SearchQuery, SearchRank in the index (not as "Included columns").

Hover on the index scan:

enter image description here

This query needs to go in an indexed view served by a SQL Server FTS catalogue, consumed by an autocomplete plugin, so it needs to be 100% optimized.

At the moment that above query is taking 3 seconds. It should be < 0.

Any ideas?

like image 436
RPM1984 Avatar asked Jun 30 '11 00:06

RPM1984


People also ask

What is an index scan?

An index scan occurs when the database manager accesses an index to narrow the set of qualifying rows (by scanning the rows in a specified range of the index) before accessing the base table; to order the output; or to retrieve the requested column data directly ( index-only access ).

Which is faster index seek or index scan?

The Index Seek refers only to the qualified rows and pages, i.e., it is selective in nature. Therefore, the Index seek is faster compared to Index scans. Let's understand the Index Scan Vs. Index Seek difference using examples.

What is the difference between table scan and index scan?

Table scan means iterate over all table rows. Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index. Usualy index scan is less expensive than a table scan because index is more flat than a table.

What is an index and why would you use not use an index?

The purpose of indexes is to enhance a database's performance, there are multiple scenarios where indexes should not be used. Indexes should not be used on tables containing few records. Tables that have frequent, large batch updates or insert operations.

What is the difference between SQL Server index scan and index seek?

SQL SERVER – Index Seek Vs. Index Scan (Table Scan) Index Scan retrieves all the rows from the table. Index Seek retrieves selective rows from the table. Since a scan touches every row in the table, whether or not it qualifies, the cost is proportional to the total number of rows in the table.

When to use index scanning?

I have read that when faced with huge amounts of data and/or when it returns more than 50% of the data set, the query optimizer will favor an index scan. But that table as a whole barely has 20,000 rows (19,820 to be exact), it's not a big table by any means.

What happens if there is no Index in the table scan?

In the table scan, every row of data goes into the data scanning. If we have the huge data in the table and no index is not there then the query execution cost will increase due to high volume table scan.

What happens if there is no Index in the database?

If there is no index, then you might see a Table Scan (Index Scan) in the execution plan. Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.


1 Answers

It is using an Index Scan primarily because it is also using a Merge Join. The Merge Join operator requires two input streams that are both sorted in an order that is compatible with the Join conditions.

And it is using the Merge Join operator to realize your INNER JOIN because it believes that that will be faster than the more typical Nested Loop Join operator. And it is probably right (it usually is), by using the two indexes it has chosen, it has input streams that are both pre-sorted according your join condition (LocationID). When the input streams are pre-sorted like this, then Merge Joins are almost always faster than the other two (Loop and Hash Joins).

The downside is what you have noticed: it appears to be scanning the whole index in, so how can that be faster if it is reading so many records that may never be used? The answer is that Scans (because of their sequential nature) can read anywhere from 10 to 100 times as many records/second as seeks.

Now Seeks usually win because they are selective: they only get the rows that you ask for, whereas Scans are non-selective: they must return every row in the range. But because Scans have a much higher read rate, they can frequently beat Seeks as long as the ratio of Discarded Rows to Matching Rows is lower than the ratio of Scan rows/sec VS. Seek rows/sec.

Questions?


OK, I have been asked to explain the last sentence more:

A "Discarded Row" is one that the the Scan reads (because it has to read everything in the index), but that will be rejected by the Merge Join operator, because it does not have a match on the other side, possibly because the WHERE clause condition has already excluded it.

"Matching Rows" are the ones that it read that are actually matched to something in the Merge Join. These are the same rows that would have been read by a Seek if the Scan were replaced by a Seek.

You can figure out what there are by looking at the statistics in the Query Plan. See that huge fat arrow to the left of the Index Scan? That represents how many rows the optimizer thinks that it will read with the Scan. The statistics box of the Index Scan that you posted shows the Actual Rows returned is about 5.4M (5,394,402). This is equal to:

TotalScanRows = (MatchingRows + DiscardedRows)

(In my terms, anyway). To get the Matching Rows, look at the "Actual Rows" reported by the Merge Join operator (you may have to take off the TOP 100 to get this accurately). Once you know this, you can get the Discarded rows by:

DiscardedRows = (TotalScanRows - MatchingRows)

And now you can calculate the ratio.

like image 158
RBarryYoung Avatar answered Sep 19 '22 19:09

RBarryYoung