Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes an SQL query optimiser decide between a nested loop and a hash join

In general what makes an SQL query optimiser decide between a nested loop and a hash join.

like image 755
cindi Avatar asked Dec 29 '09 16:12

cindi


People also ask

Why is hash join better than nested loop join?

For certain types of SQL, the hash join will execute faster than a nested loop join, but the hash join uses more RAM resources. Nested loops join - The nested loops table join is one of the original table join plans and it remains the most common.

What are the differences between hash join Merge join and nested loops?

Nested Loops are used to join smaller tables. Further, nested loop join uses during the cross join and table variables. Merge Joins are used to join sorted tables. This means that Merge joins are utilized when join columns are indexed in both tables while Hash Match join uses a hash table to join equi joins.

How convert nested loop to hash join in SQL Server?

Find nodes with a high number of rows and executions. Follow the path upwards until I find the right nested loop. Find out which join is causing the nested loop and force it to a hash join.

What is nested loop join in SQL Server?

A Nested Loops join is a logical structure in which one loop (iteration) resides inside another one, that is to say for each iteration of the outer loop all the iterations of the inner loop are executed/processed. A Nested Loops join works in the same way.


2 Answers

NESTED LOOPS are good if the condition inside the loop is sargable, that is index can be used to limit the number of records.

For a query like this:

SELECT  *
FROM    a
JOIN    b
ON      b.b1 = a.a1
WHERE   a.a2 = @myvar

, with a leading, each record from a will be taken and all corresponding records in b should be found.

If b.b1 is indexed and has high cardinality, then NESTED LOOP will be a preferred way.

In SQL Server, it is also the only way to execute non-equijoins (something other than = condition in the ON clause)

HASH JOIN is the fastest method if all (or almost all) records should be parsed.

It takes all records from b, builds a hash table over them, then takes all records from a and uses the value of the join column as a key to look up the hash table.

  • NESTED LOOPS takes this time:

    Na * (Nb / C) * R,

    where Na and Nb are the numbers of records in a and b, C is the index cardinality, and R is constant time required for the row lookup (1 is all fields in SELECT, WHERE and ORDER BY clauses are covered by the index, about 10 if they are not)

  • HASH JOIN takes this time:

    Na + (Nb * H)

    , where H is sum of constants required to build and lookup the hash table (per record). They are programmed into the engine.

SQL Server computes the cardinality using the table statistics, computes and compares the two values and chooses the best plan.

like image 92
Quassnoi Avatar answered Oct 20 '22 01:10

Quassnoi


Typically, it's going to be dependent on the size of the sets that are being joined.

I highly recommend reading "Inside Microsoft SQL Server 2008: T-SQL Querying" by Itzik Ben-Gan:

http://www.solidq.com/insidetsql/books/insidetsql2008/

(the 2005 edition is just as applicable on this topic as well)

He goes into your question, as well as many others when it comes to getting the most out of your queries.

like image 21
casperOne Avatar answered Oct 19 '22 23:10

casperOne