What are the advantages, if any, of explicitly doing a HASH JOIN over a regular JOIN (wherein SQL Server will decide the best JOIN strategy)? Eg:
select pd.*
from profiledata pd
inner hash join profiledatavalue val on val.profiledataid=pd.id
In the simplistic sample code above, I'm specifying the JOIN strategy, whereas if I leave off the "hash" key word SQL Server will do a MERGE JOIN behind the scenes (per the "actual execution plan").
Hash joins are typically more efficient than nested loops joins, except when the probe side of the join is very small. They require an equijoin predicate (a predicate comparing records from one table with those from the other table using a conjunction of equality operators '=' on one or more columns).
Merge joins are faster and uses less memory than hash joins. Hash join is used when projections of the joined tables are not already sorted on the join columns.
The HASH join might be faster than a SORT-MERGE join, in this case, because only one row source needs to be sorted, and it could possibly be faster than a NESTED LOOPS join because probing a hash table in memory can be faster than traversing a b-tree index.
Nested Loops are used to join smaller tables. Further, nested loop join uses during the cross join and table variables. Merge Joins are used to join sorted tables. This means that Merge joins are utilized when join columns are indexed in both tables while Hash Match join uses a hash table to join equi joins.
The optmiser does a good enough job for everyday use. However, in theory it might need 3 weeks to find the perfect plan in the extreme, so there is a chance that the generated plan will not be ideal.
I'd leave it alone unless you have a very complex query or huge amounts of data where it simply can't produce a good plan. Then I'd consider it.
But over time, as data changes/grows or indexes change etc, your JOIN hint will becomes obsolete and prevents an optimal plan. A JOIN hint can only optimise for that single query at the time of development with that set of data you have.
Personally, I've never specified a JOIN hint in any production code.
I've normally solved a bad join by changing my query around, adding/changing an index or breaking it up (eg load a temp table first). Or my query was just wrong, or I had an implicit data type conversion, or it highlighted a flaw in my schema etc.
I've seen other developers use them but only where they had complex views nested upon complex views and they caused later problems when they refactored.
Edit:
I had a conversion today where some colleagues are going to use them to force a bad query plan (with NOLOCK and MAXDOP 1) to "encourage" migration away from legacy complex nested views that one of their downstream system calls directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With