Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server query performance - removing need for Hash Match (Inner Join)

Tags:

I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.

select t1.PrimaryKeyId, t1.AdditionalColumnId from TableOne t1     join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId     join TableThree t3 on t1.PrimaryKeyId = t3.ForeignKeyId     join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId     join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId where      t1.StatusId = 1     and t5.TypeId = 68 

There are indexes on all the join columns, however the performance is not great. Inspecting the query plan reveals a lot of Hash Match (Inner Joins) when really I want to see Nested Loop joins.

The number of records in each table is as follows:

select count(*) from TableOne 

= 64393

select count(*) from TableTwo 

= 87245

select count(*) from TableThree 

= 97141

select count(*) from TableFour 

= 116480

select count(*) from TableFive 

= 62

What is the best way in which to improve the performance of this type of query?

like image 803
Tim Peel Avatar asked Aug 28 '11 13:08

Tim Peel


People also ask

How can prevent hash join in SQL Server?

Hash joins are best for joins, if you really want to remove hash join create index on the joining column and it will be index join and performance will be bad.

Does inner join order matter for performance?

JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.


2 Answers

First thoughts:

  1. Change to EXISTS (changes equi-join to semi-join)
  2. You need to have indexes on t1.StatusId, t5.TypeId and INCLUDE t1.AdditionalColumnID

I wouldn't worry about your join method yet...

Personally, I've never used a JOIN hint. They only work for the data, indexes and statistics you have at that point in time. As these change, your JOIN hint limits the optimiser

select t1.PrimaryKeyId, t1.AdditionalColumnId from     TableOne t1 where      t1.Status = 1     AND EXISTS (SELECT *         FROM           TableThree t3           join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId           join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId         WHERE           t1.PrimaryKeyId = t3.ForeignKeyId           AND           t5.TypeId = 68)     AND EXISTS (SELECT *         FROM           TableTwo t2         WHERE           t1.ForeignKeyId = t2.PrimaryKeyId) 

Index for tableOne.. one of

  • (Status, ForeignKeyId) INCLUDE (AdditionalColumnId)
  • (ForeignKeyId, Status) INCLUDE (AdditionalColumnId)

Index for tableFive... probably (typeID, PrimaryKeyId)

Edit: updated JOINS and EXISTS to match question fixes

like image 57
gbn Avatar answered Sep 19 '22 13:09

gbn


SQL Server is pretty good at optimizing queries, but it's also conservative: it optimizes queries for the worst case. A loop join typically results in an index lookup and a bookmark lookup for for every row. Because loop joins cause dramatic degradation for large sets, SQL Server is hesitant to use them unless it's sure about the number of rows.

You can use the forceseek query hint to force an index lookup:

inner join TableTwo t2 with (FORCESEEK) on t1.ForeignKeyId = t2.PrimaryKeyId 

Alternatively, you can force a loop join with the loop keyword:

inner LOOP join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId 

Query hints limit SQL Server's freedom, so it can no longer adapt to changed circumstances. It's best practice to avoid query hints unless there is a business need that cannot be met without them.

like image 27
Andomar Avatar answered Sep 20 '22 13:09

Andomar