In general what makes an SQL query optimiser decide between a nested loop and a hash join.
For certain types of SQL, the hash join will execute faster than a nested loop join, but the hash join uses more RAM resources. Nested loops join - The nested loops table join is one of the original table join plans and it remains the most common.
Nested Loops are used to join smaller tables. Further, nested loop join uses during the cross join and table variables. Merge Joins are used to join sorted tables. This means that Merge joins are utilized when join columns are indexed in both tables while Hash Match join uses a hash table to join equi joins.
Find nodes with a high number of rows and executions. Follow the path upwards until I find the right nested loop. Find out which join is causing the nested loop and force it to a hash join.
A Nested Loops join is a logical structure in which one loop (iteration) resides inside another one, that is to say for each iteration of the outer loop all the iterations of the inner loop are executed/processed. A Nested Loops join works in the same way.
NESTED LOOPS
are good if the condition inside the loop is sargable, that is index can be used to limit the number of records.
For a query like this:
SELECT *
FROM a
JOIN b
ON b.b1 = a.a1
WHERE a.a2 = @myvar
, with a
leading, each record from a
will be taken and all corresponding records in b
should be found.
If b.b1
is indexed and has high cardinality, then NESTED LOOP
will be a preferred way.
In SQL Server
, it is also the only way to execute non-equijoins (something other than =
condition in the ON
clause)
HASH JOIN
is the fastest method if all (or almost all) records should be parsed.
It takes all records from b
, builds a hash table over them, then takes all records from a
and uses the value of the join column as a key to look up the hash table.
NESTED LOOPS
takes this time:
Na * (Nb / C) * R
,
where Na
and Nb
are the numbers of records in a
and b
, C
is the index cardinality, and R
is constant time required for the row lookup (1
is all fields in SELECT
, WHERE
and ORDER BY
clauses are covered by the index, about 10
if they are not)
HASH JOIN
takes this time:
Na + (Nb * H)
, where H
is sum of constants required to build and lookup the hash table (per record). They are programmed into the engine.
SQL Server
computes the cardinality using the table statistics, computes and compares the two values and chooses the best plan.
Typically, it's going to be dependent on the size of the sets that are being joined.
I highly recommend reading "Inside Microsoft SQL Server 2008: T-SQL Querying" by Itzik Ben-Gan:
http://www.solidq.com/insidetsql/books/insidetsql2008/
(the 2005 edition is just as applicable on this topic as well)
He goes into your question, as well as many others when it comes to getting the most out of your queries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With