I have a query that basically looks like this:
Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
left outer join Surcharges s on s.ContainerDetailId = cd.Id
where us.SearchDate between @beginDate and @endDate
Given certain values of @beginDate and @endDate, I have a search that takes 30 seconds to return around 100K rows.
The ultimate goal is to populate some objects that have parent-child-child-child relationships. So after some experimentation, I found that I could speed up the query dramatically with the following:
Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate
Select cd.Id into #cdIds
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate
Select * From Surcharges s
inner join #cdIds on s.ContainerDetailId = #cdIds.Id
DROP TABLE #cdIds
This runs in 10 seconds, which makes no sense to me. Surely it should be faster just to join the Surcharges in the first place.
The Surcharge table has the following indexes:
PK:
ALTER TABLE [dbo].[Surcharges] ADD CONSTRAINT [PK_dbo.Surcharges] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
IX1:
CREATE NONCLUSTERED INDEX [IX_Surcharge_ContainerDetailId] ON [dbo].[Surcharges]
(
[ContainerDetailId] ASC
)
INCLUDE ( [Id],
[Every],
[Single],
[Column],
[About],
[Twelve],
[Of],
[Them],
)
IX2:
CREATE NONCLUSTERED INDEX [IX_ContainerDetailId] ON [dbo].[Surcharges]
(
[ContainerDetailId] ASC
)
To sum up, why is it faster to do a separate query for my Surcharges than it is to join them in the first place?
EDIT: Here are the execution plans. These are .sqlplan files that you can open in Sql Studio:
Query 1 - Combined
Query 2 - Seperate
To understand what is going on look at the actual execution plans.
Preferably in SQL Sentry Plan Explorer.
You'll see that your first variant has Actual Data Size
= 11,272 MB in 100,276 rows.
In the second variant the query that populate temp table returns only 173KB in 19,665 rows. The last query returns 1,685 MB in 87,510 rows.
11,272 MB
is much more than 1,685 MB
No wonder the first query is slower.
This difference is caused by two factors:
In the first variant you select all columns from UserSearches
, Quotes
, ContainerDetails
tables. While in the second variant you select only ID
from ContainerDetails
. Apart from reading from disk and transmitting over network extra bytes this difference results in substantially different plans. Second variant doesn't do Sort, doesn't do Key Lookup and uses Hash joins instead of Nested Loops. It uses different indexes on Quotes
. Second variant uses index Scan on ContainerDetails
instead of Seek.
The queries produce different number of rows, because first variant uses LEFT JOIN
and second INNER JOIN
.
So, to make them comparable:
*
list explicitly only those columns that you need.INNER JOIN
(or LEFT JOIN
) Surcharges
in both variants.update
Your question was "why SQL Server would run the second query faster", the answer is: because the queries are different and they produce different results (different set of rows, different set of columns).
Now you are asking another question: how to make them the same and fast.
Which of your two variants produces correct result that you want? I'll assume that it is the second variant with temp table.
Please note, that I'm not answering here how to make them fast. I'm answering here how to make them the same.
The following single query should produce exactly the same results as your second variant with temporary table, but without explicit temporary table. I would expect its performance to be similar to your second variant with temporary tables. I deliberately wrote it using CTE to copy the structure of your variant with temp table, though it is easy to rewrite it without. Optimizer would be smart enough to do it anyway.
WITH
CTE
AS
(
Select cd.Id
From
UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where
us.SearchDate between @beginDate and @endDate
)
Select *
From
Surcharges s
inner join CTE on s.ContainerDetailId = CTE.Id
;
Well, there seem to be 2 large contributing differences in what I can see in the queries themselves as well as the plans.
First, and likely most impactful, in your second version where you are using the temp table, your final query against the Surcharges table is INNER JOINing instead of the LEFT JOIN operator you were using in the original query. I'm not sure which version is accurate, but the difference in the number of returned records seems to be very high based on the plan information (18.6 million in the first version vs. a 5.1 million in the second version). If you change your first version to an INNER JOIN on the Surcharges table, do you see similar results in terms of duration?
Second, and likely less impactful, your second version is giving you a parallel execution it seems on the select...into portion of the batch. Without seeing additional stuff, I likely wouldn't dare comment on why that may be, but it is a potential differentiator.
I'd start with the first contributor and see what you wind up with and go from there.
EDIT:
To help clarify with the comments, try changing your first query to this and attaching the query plan / reviewing the results/duration of that vs. the temp table/select...into version:
Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
INNER join Surcharges s on s.ContainerDetailId = cd.Id
where us.SearchDate between @beginDate and @endDate
That should hopefully give you a more-or-less similar duration as the second version - if still not, please attach the query plan for that version.
But you are not comparing apples to apples
The first is 3 left
The second is 2 left and 1 inner join
And in the second the results are split
Try this
Move us.SearchDate between @beginDate and @endDate up into the join
I suspect it is doing a massive join and the filtering last
Get the date filter to happen early
Select *
From UserSearches us
left outer join Quotes q
on q.UserSearchId = us.Id
and q.QuoteNumber is not null
and us.SearchDate between @beginDate and @endDate
left outer join ContainerDetails cd
on cd.QuoteId = q.Id
left outer join Surcharges s
on s.ContainerDetailId = cd.Id
The fast search makes no sense to me
Those left joins do absolutely nothing to this
All the left does is return cd.ID = null
Select cd.Id into #cdIds
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate
if you just want Surcharges then
Select s.*
From UserSearches us
join Quotes q
on q.UserSearchId = us.Id
and q.QuoteNumber is not null
and us.SearchDate between @beginDate and @endDate
join ContainerDetails cd
on cd.QuoteId = q.Id
join Surcharges s on
on s.ContainerDetailId = cd.Id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With