Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LEFT JOIN Significantly faster than INNER JOIN

Tags:

I have a table (MainTable) with a bit over 600,000 records. It joins onto itself via a 2nd table (JoinTable) in a parent/child type relationship:

SELECT   Child.ID, Parent.ID FROM     MainTable AS       Child JOIN     JoinTable       ON Child.ID = JoinTable.ID JOIN     MainTable AS       Parent       ON Parent.ID = JoinTable.ParentID      AND Parent.SomeOtherData = Child.SomeOtherData 

I know that every child record has a parent record and the data in JoinTable is acurate.

When I run this query it takes literally minutes to run. However if I join to Parent using a Left Join then it takes < 1 second to run:

SELECT   Child.ID, Parent.ID FROM     MainTable AS       Child JOIN     JoinTable       ON Child.ID = JoinTable.ID LEFT JOIN MainTable AS       Parent       ON Parent.ID = JoinTable.ParentID      AND Parent.SomeOtherData = Child.SomeOtherData WHERE    ...[some info to make sure we don't select parent records in the child dataset]... 

I understand the difference in the results between an INNER JOIN and a LEFT JOIN. In this case it is returning exactly the same result as every child has a parent. If I let both queries run, I can compare the datasets and they are exactly the same.

Why is it that a LEFT JOIN runs so much faster than an INNER JOIN?


UPDATE Checked the query plans and when using an inner join it starts with the Parent dataset. When doing a left join it starts with the child dataset.

The indexes it uses are all the same.

Can I force it to always start with the child? Using a left join works, it just feels wrong.


Similar questions have been asked here before, but none seem to answer my question.

e.g. the selected answer in INNER JOIN vs LEFT JOIN performance in SQL Server says that Left Joins are always slower than Inner joins. The argument makes sense, but it's not what I'm seeing.

like image 570
Greg Avatar asked Jun 14 '13 03:06

Greg


2 Answers

The Left join seems to be faster because SQL is forced to do the smaller select first and then join to this smaller set of records. For some reason the optimiser doesn't want to do this naturally.

3 ways to force the joins to happen in the right order:

  1. Select the first subset of data into a temporary table (or table variable) then join on it
  2. Use left joins (and remember that this could return different data because it's a left join not an inner join)
  3. use the FORCE ORDER keyword. Note that if table sizes or schemas change then the query plan may not be correct (see https://dba.stackexchange.com/questions/45388/forcing-join-order)
like image 143
Greg Avatar answered Nov 02 '22 22:11

Greg


Try this one. Same result, different approach:

SELECT c.ID, p.ID  FROM (SELECT   Child.ID, JoinTable.ParentID FROM     MainTable AS       Child JOIN     JoinTable       ON Child.ID = JoinTable.ID) AS c INNER JOIN  (SELECT   Parent.ID, JoinTable.ID FROM     MainTable AS       Parent JOIN     JoinTable       ON Parent.ID = JoinTable.ParentID      AND Parent.SomeOtherData = Child.SomeOtherData) AS p ON c.ParentID = p.ID 

If it does not help, use cte:

;WITH cte AS (SELECT   Child.ID, JoinTable.ParentID FROM     MainTable AS       Child JOIN     JoinTable       ON Child.ID = JoinTable.ID) SELECT cte.ID, Parent.ID FROM cte INNER JOIN  MainTable AS       Parent       ON Parent.ID = cte.ParentID      AND Parent.SomeOtherData = cte.SomeOtherData 
like image 37
cha Avatar answered Nov 02 '22 22:11

cha