I have a basic query that goes from 6 seconds to 1 second just by changing one join from LEFT JOIN
to LEFT HASH JOIN
or 'LEFT LOOP JOIN'. Can anyone explain why this would cause such a large increase in performance and why SQL's optimizer isn't figuring it out on it's own?
Here is roughly what the SQL looks like:
SELECT
a.[ID]
FROM
[TableA] a
LEFT HASH JOIN
[TableB] b
ON b.[ID] = a.[TableB_ID]
JOIN
[TableC] c
ON c.[ID] = a.[TableC_ID]
WHERE
a.[SomeDate] IS NULL AND
a.[SomeStatus] IN ('X', 'Y', 'Z') AND
c.[SomethingElse] = 'ABC'
Table A and B have millions of records and indexes on all the ID fields. Using SQL Server 2005.
Edit: A collegue suggested a LEFT LOOP JOIN and it seems to have made it even faster... SQL is not one of my strengths so I am trying to understand how these 'hints' are helping.
HASH JOIN
is useful when the large percent of rows contributes to the resultset.
In your case, building a HASH TABLE
on either A
or B
and scanning another table is cheaper than either performing NESTED LOOPS
over the index on B.ID
or merging the sorted resultsets which the optimizer used before the hint.
SQL Server
's optimizer did not see that: probably because you didn't gather statistics, probably because your data distribution is skewed.
Update:
Since you mentioned that LOOP JOIN
improved the speed, it may be so that the JOIN
order was chosen incorrectly by the optimizer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With