I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
* the queries will return the same results. But the order matters for (LEFT, RIGHT or FULL) OUTER joins. Outer joins are not commutative.
Basically, join order DOES matter because if we can join two tables that will reduce the number of rows needed to be processed by subsequent steps, then our performance will improve.
The order of the conditions in the ON clause doesn't matter.
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With