I have executed a query and included the Actual Execution Plan. There is one Hash Match that is of interest to me because it's subtree uses a Index Scan instead of an index seek. When I mouse over this Hash Match there is a section called "Probe Residual". I had assumed that this is whatever values I am joining on. Am I correct here or is there a better explanation of what that means? The second question I had is regarding the indexes it uses. In my example I am pretty sure this particular join is joining on two columns. The index that it is Scanning has both of these columns in it as well as another column that is not used in the join. I was under the impression that this would result in an Index Seek rather than a Scan. Am I mistaken on this?

This blog post will probably answer your first question. As for your second, index scans might be selected by the optimizer in a number of situations. Off the top of my head: <ul> <li>If the index is very small</li> <li>If most of the rows in the index will be selected by the query</li> <li>If you are using functions in the where clause of your query</li> </ul> For the first two cases, it's more efficient to do a scan, so the optimizer chooses it over a seek. For the third case, the optimizer has no choice.

1/ A Hash Match means that it takes a hash of columns used in an equality join, but needs to include all the other columns involved in the join (for >, etc) so that they can be checked too. This is where residual columns come in. 2/ An Index Seek can be done if it can go straight to the rows you want. Perhaps you're applying a calculation to the columns and using that? Then it will use the index as a smaller version of the data, but will still need to check every row (applying the calculation on each one).

Check out those excellent articles on execution plans on simple-talk.com: <ul> <li>Execution Plan Basics</li> <li>SQL Server Execution Plans</li> <li>Graphical Execution Plans for simple SQL queries</li> <li>Understanding more complex execution plans</li> </ul> They also have a free e-book SQL Server execution plans for download.

Question on how to read a SQL Execution plan

Tags:

sql

sql-server

sql-server-2005

sql-execution-plan

I have executed a query and included the Actual Execution Plan. There is one Hash Match that is of interest to me because it's subtree uses a Index Scan instead of an index seek. When I mouse over this Hash Match there is a section called "Probe Residual". I had assumed that this is whatever values I am joining on. Am I correct here or is there a better explanation of what that means?

The second question I had is regarding the indexes it uses. In my example I am pretty sure this particular join is joining on two columns. The index that it is Scanning has both of these columns in it as well as another column that is not used in the join. I was under the impression that this would result in an Index Seek rather than a Scan. Am I mistaken on this?

859

asked Jan 21 '10 00:01

Abe Miessler

4 Answers

A Hash Join will generally (always?) use a scan or at least a range scan. A hash join works by scanning both left and right join tables (or a range in the tables) and building an in-memory hash table that contains all values 'seen' by the scans.

What happened in your case is this: the QO noticed that it can obtain all the values of a column C from a non-clustered index that happens to contain this column (as a key or as an included column). Being a non-clustered index is probably fairly narrow, so the total amount of IO to scan the entire non-clustered index is not exaggerate. The QO also considered that the system has enough RAM to store a hash table in memory. When compared the cost of this query (a scan of a non-clustered index end-to-end for, say, 10000 pages) with the cost of a nested loop that used seeks (say 5000 probes at 2-3 pages each) the scan won as requiring less IO. Of course, is largely speculation on my part, but I'm trying to present the case from the QO point of view, and the plan is likely optimal.

Factors that contributed to this particular plan choice would be:

a large number of estimated candidates on the right side of the join
availability of the join column in a narrow non-clustered index for the left side
plenty of RAM

For a large estimate of the number of candidates, a better choice than the hash join is only the merge-join, and that one requires the input to be presorted. If both the left side can offer an access path that guarantees an order on the joined column and the right side has a similar possibility then you may end up with the merge join, which is the fastest join.

answered Nov 15 '22 09:11

Remus Rusanu

This blog post will probably answer your first question.

As for your second, index scans might be selected by the optimizer in a number of situations. Off the top of my head:

If the index is very small
If most of the rows in the index will be selected by the query
If you are using functions in the where clause of your query

For the first two cases, it's more efficient to do a scan, so the optimizer chooses it over a seek. For the third case, the optimizer has no choice.

answered Nov 15 '22 10:11

womp

1/ A Hash Match means that it takes a hash of columns used in an equality join, but needs to include all the other columns involved in the join (for >, etc) so that they can be checked too. This is where residual columns come in.

2/ An Index Seek can be done if it can go straight to the rows you want. Perhaps you're applying a calculation to the columns and using that? Then it will use the index as a smaller version of the data, but will still need to check every row (applying the calculation on each one).

answered Nov 15 '22 10:11

Rob Farley

Check out those excellent articles on execution plans on simple-talk.com:

Execution Plan Basics
SQL Server Execution Plans
Graphical Execution Plans for simple SQL queries
Understanding more complex execution plans

They also have a free e-book SQL Server execution plans for download.

answered Nov 15 '22 09:11

marc_s

Related questions
                            
                                Using LIKE in SQL with multiple search terms
                            
                                How to take max between two values in presto?
                            
                                Postgres - select non-blank non-null values from multiple ordered rows
                            
                                Why does this SQL script work as it does?
                            
                                Optimal DB structure for additional fields entity
                            
                                create sql query in c++/java?
                            
                                Retrieve value of an xml element in Oracle PL SQL
                            
                                SQL query that distinguishes between ß and ss
                            
                                What is wrong with my update statement with a join in Oracle?
                            
                                SSMS permits duplicate records in a table, but not subsequent updates
                            
                                How do You Get a Specific Value From a System.Data.DataTable Object?
                            
                                Entity Framework Generated SQL
                            
                                SQL Server Boolean Expression evaluation
                            
                                SQL Server xml string parsing in varchar field
                            
                                SQL: Insert only new rows/records into a table?
                            
                                Powershell scripts to backup SQL, SVN
                            
                                inner joins in oracle
                            
                                SQL Server CONVERT(NUMERIC(18,0), '') fails but CONVERT(INT, '') succeeds?
                            
                                How to create automatically a table populate script from current data in SQL Server?
                            
                                Fetch the row which has the Max value for a column in SQL Server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With