I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.
select t1.PrimaryKeyId, t1.AdditionalColumnId from TableOne t1 join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId join TableThree t3 on t1.PrimaryKeyId = t3.ForeignKeyId join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId where t1.StatusId = 1 and t5.TypeId = 68
There are indexes on all the join columns, however the performance is not great. Inspecting the query plan reveals a lot of Hash Match (Inner Joins) when really I want to see Nested Loop joins.
The number of records in each table is as follows:
select count(*) from TableOne
= 64393
select count(*) from TableTwo
= 87245
select count(*) from TableThree
= 97141
select count(*) from TableFour
= 116480
select count(*) from TableFive
= 62
What is the best way in which to improve the performance of this type of query?
Hash joins are best for joins, if you really want to remove hash join create index on the joining column and it will be index join and performance will be bad.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
First thoughts:
I wouldn't worry about your join method yet...
Personally, I've never used a JOIN hint. They only work for the data, indexes and statistics you have at that point in time. As these change, your JOIN hint limits the optimiser
select t1.PrimaryKeyId, t1.AdditionalColumnId from TableOne t1 where t1.Status = 1 AND EXISTS (SELECT * FROM TableThree t3 join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId WHERE t1.PrimaryKeyId = t3.ForeignKeyId AND t5.TypeId = 68) AND EXISTS (SELECT * FROM TableTwo t2 WHERE t1.ForeignKeyId = t2.PrimaryKeyId)
Index for tableOne.. one of
(Status, ForeignKeyId) INCLUDE (AdditionalColumnId)
(ForeignKeyId, Status) INCLUDE (AdditionalColumnId)
Index for tableFive... probably (typeID, PrimaryKeyId)
Edit: updated JOINS and EXISTS to match question fixes
SQL Server is pretty good at optimizing queries, but it's also conservative: it optimizes queries for the worst case. A loop join typically results in an index lookup and a bookmark lookup for for every row. Because loop joins cause dramatic degradation for large sets, SQL Server is hesitant to use them unless it's sure about the number of rows.
You can use the forceseek
query hint to force an index lookup:
inner join TableTwo t2 with (FORCESEEK) on t1.ForeignKeyId = t2.PrimaryKeyId
Alternatively, you can force a loop join with the loop
keyword:
inner LOOP join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
Query hints limit SQL Server's freedom, so it can no longer adapt to changed circumstances. It's best practice to avoid query hints unless there is a business need that cannot be met without them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With