Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Would An Outer Join Be Slower Than Separate Queries

I have a query that basically looks like this:

Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
left outer join Surcharges s on s.ContainerDetailId = cd.Id
where us.SearchDate between @beginDate and @endDate

Given certain values of @beginDate and @endDate, I have a search that takes 30 seconds to return around 100K rows.

The ultimate goal is to populate some objects that have parent-child-child-child relationships. So after some experimentation, I found that I could speed up the query dramatically with the following:

Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate

Select cd.Id into #cdIds
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate

Select * From Surcharges s
inner join #cdIds on s.ContainerDetailId = #cdIds.Id

DROP TABLE #cdIds

This runs in 10 seconds, which makes no sense to me. Surely it should be faster just to join the Surcharges in the first place.

The Surcharge table has the following indexes:

PK:

ALTER TABLE [dbo].[Surcharges] ADD  CONSTRAINT [PK_dbo.Surcharges] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)

IX1:

CREATE NONCLUSTERED INDEX [IX_Surcharge_ContainerDetailId] ON [dbo].[Surcharges]
(
    [ContainerDetailId] ASC
)
INCLUDE (   [Id],
[Every],
[Single],
[Column],
[About],
[Twelve],
[Of],
[Them],
)

IX2:

CREATE NONCLUSTERED INDEX [IX_ContainerDetailId] ON [dbo].[Surcharges]
(
    [ContainerDetailId] ASC
)

To sum up, why is it faster to do a separate query for my Surcharges than it is to join them in the first place?

EDIT: Here are the execution plans. These are .sqlplan files that you can open in Sql Studio:

Query 1 - Combined

Query 2 - Seperate

like image 379
Pharylon Avatar asked Sep 14 '15 14:09

Pharylon


3 Answers

To understand what is going on look at the actual execution plans.

Preferably in SQL Sentry Plan Explorer.

You'll see that your first variant has Actual Data Size = 11,272 MB in 100,276 rows.

first

In the second variant the query that populate temp table returns only 173KB in 19,665 rows. The last query returns 1,685 MB in 87,510 rows.

second a

second b

11,272 MB is much more than 1,685 MB

No wonder the first query is slower.

This difference is caused by two factors:

  1. In the first variant you select all columns from UserSearches, Quotes, ContainerDetails tables. While in the second variant you select only ID from ContainerDetails. Apart from reading from disk and transmitting over network extra bytes this difference results in substantially different plans. Second variant doesn't do Sort, doesn't do Key Lookup and uses Hash joins instead of Nested Loops. It uses different indexes on Quotes. Second variant uses index Scan on ContainerDetails instead of Seek.

  2. The queries produce different number of rows, because first variant uses LEFT JOIN and second INNER JOIN.

So, to make them comparable:

  1. Instead of using * list explicitly only those columns that you need.
  2. Use INNER JOIN (or LEFT JOIN) Surcharges in both variants.

update

Your question was "why SQL Server would run the second query faster", the answer is: because the queries are different and they produce different results (different set of rows, different set of columns).

Now you are asking another question: how to make them the same and fast.

Which of your two variants produces correct result that you want? I'll assume that it is the second variant with temp table.

Please note, that I'm not answering here how to make them fast. I'm answering here how to make them the same.

The following single query should produce exactly the same results as your second variant with temporary table, but without explicit temporary table. I would expect its performance to be similar to your second variant with temporary tables. I deliberately wrote it using CTE to copy the structure of your variant with temp table, though it is easy to rewrite it without. Optimizer would be smart enough to do it anyway.

WITH
CTE
AS
(
    Select cd.Id
    From
        UserSearches us
        left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
        left outer join ContainerDetails cd on cd.QuoteId = q.Id
    where
        us.SearchDate between @beginDate and @endDate
)
Select *
From
    Surcharges s
    inner join CTE on s.ContainerDetailId = CTE.Id
;
like image 157
Vladimir Baranov Avatar answered Oct 19 '22 03:10

Vladimir Baranov


Well, there seem to be 2 large contributing differences in what I can see in the queries themselves as well as the plans.

First, and likely most impactful, in your second version where you are using the temp table, your final query against the Surcharges table is INNER JOINing instead of the LEFT JOIN operator you were using in the original query. I'm not sure which version is accurate, but the difference in the number of returned records seems to be very high based on the plan information (18.6 million in the first version vs. a 5.1 million in the second version). If you change your first version to an INNER JOIN on the Surcharges table, do you see similar results in terms of duration?

Second, and likely less impactful, your second version is giving you a parallel execution it seems on the select...into portion of the batch. Without seeing additional stuff, I likely wouldn't dare comment on why that may be, but it is a potential differentiator.

I'd start with the first contributor and see what you wind up with and go from there.

EDIT:

To help clarify with the comments, try changing your first query to this and attaching the query plan / reviewing the results/duration of that vs. the temp table/select...into version:

Select *
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
INNER join Surcharges s on s.ContainerDetailId = cd.Id
where us.SearchDate between @beginDate and @endDate

That should hopefully give you a more-or-less similar duration as the second version - if still not, please attach the query plan for that version.

like image 35
boydc7 Avatar answered Oct 19 '22 05:10

boydc7


But you are not comparing apples to apples
The first is 3 left
The second is 2 left and 1 inner join
And in the second the results are split

Try this
Move us.SearchDate between @beginDate and @endDate up into the join
I suspect it is doing a massive join and the filtering last
Get the date filter to happen early

Select *
  From UserSearches us
  left outer join Quotes q 
        on q.UserSearchId = us.Id 
       and q.QuoteNumber is not null 
       and us.SearchDate between @beginDate and @endDate
  left outer join ContainerDetails cd 
        on cd.QuoteId = q.Id
  left outer join Surcharges s 
        on s.ContainerDetailId = cd.Id

The fast search makes no sense to me

Those left joins do absolutely nothing to this
All the left does is return cd.ID = null

Select cd.Id into #cdIds
From UserSearches us
left outer join Quotes q on q.UserSearchId = us.Id and q.QuoteNumber is not null
left outer join ContainerDetails cd on cd.QuoteId = q.Id
where us.SearchDate between @beginDate and @endDate

if you just want Surcharges then

   Select s.*
     From UserSearches us
     join Quotes q 
       on q.UserSearchId = us.Id 
      and q.QuoteNumber is not null 
      and us.SearchDate between @beginDate and @endDate
     join ContainerDetails cd 
       on cd.QuoteId = q.Id
     join Surcharges s on 
       on s.ContainerDetailId = cd.Id
like image 1
paparazzo Avatar answered Oct 19 '22 04:10

paparazzo