using sql server 2014; ((SP1-CU3) (KB3094221) Oct 10 2015 x64
I have the following query
SELECT * FROM dbo.table1 t1
LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo
LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo
order by t1.tradeNo
there are ~70k, 35k and 73k rows in t1,t2 and t3 respectively.
When I omit the order by
this query executes in 3 seconds with 73k rows.
As written the query took 8.5 minutes to return ~50k rows (I since stopped it)
Switching the order of the LEFT JOIN
s makes a difference:
SELECT * FROM dbo.table1 t1
LEFT JOIN dbo.table3 t3 ON t3.TradeReportID = t1.tradeNo
LEFT JOIN dbo.table2 t2 ON t2.trade_id = t1.tradeNo
order by t1.tradeNo
This now runs in 3 seconds.
I dont have any indexes on the tables. Adding indexes t1.tradeNo
and t2.trade_id
and t3.TradeReportID
has no effect.
Running the query with only one left join (both scenarios) in combination with the order by
is fast.
Its fine for me to swap the order of the LEFT JOIN
s but this doesnt go far to explaining why this happens and under what scenarios it may happen again
The estimated exectuion plan is: (slow)
(exclamation mark details)
VS
Switching the order of the left joins (fast):
which I note are markedly different but I cannot interpret these to explain the performance difference
It appears the addition of the order by
clause results in the execution plan using the Table Spool (lazy spool) vs NOT using this in the fast query.
If I turn off the table spool via DBCC RULEOFF ('BuildSpool');
this 'fixes' the speed but according to this post this isnt recommended and cannot do it per query anyway
One of the columns returned (table3.Text
] has type varchar(max)
) - If this is changed to nvarchar(512)
then the original (slow) query is now fast - ie the execution plan now decides to not use the Table Spool - also note that even tho the type is varchar(max)
the field values are NULL for every one of the rows. This is now fixable but I am none the wiser
Warnings in the execution plan stated
Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[t2].[trade_id],0)) may affect "CardinalityEstimate" in query plan choice, ...
t1.tradeNo
is nvarchar(21)
- the other two are varchar(50)
- after altering the latter two to the same as the first the problem disappears! (leaving varchar(max) as stated in UPDATE 2 unchanged)
Given this problem goes away when either UPDATE 2 or UPDATE 3 are rectified I would guess that its a combination of the query optimizer using a temp table (table spool) for a column that has an unbounded size - very interesting despite the nvarchar(max)
column having no data.
Your likely best fix is to make sure both sides of your joins have the same datatype. There's no need for one to be varchar
and the other nvarchar
.
This is a class of problems that comes up quite frequently in DBs. The database is assuming the wrong thing about the composition of the data it's about to deal with. The costs shown in your estimated execution plan are likely a long way from what you'd get in your actual plan. We all make mistakes and really it would be good if SQL Server learned from its own but currently it doesn't. It will estimate a 2 second return time despite being immediately proven wrong again and again. To be fair, I don't know of any DBMS which has machine-learning to do better.
The hardest part is done up front by sorting table3. That means it can do an efficient merge join which in turn means it has no reason to be lazy about spooling.
Having an ID that refers to the same thing stored as two different types and data lengths shouldn't ever be necessary and will never be a good idea for an ID. In this case nvarchar
in one place varchar
in another. When that makes it fail to get a cardinality estimate that's the key flaw and here's why:
The optimizer is hoping to require only a few unique rows from table3. Just a handful of options like "Male", "Female", "Other". That would be what is known as "low cardinality". So imagine tradeNo actually contained IDs for genders for some weird reason. Remember, it's you with your human skills of contextualisation, who knows that's very unlikely. The DB is blind to that. So here is what it expects to happen: As it executes the query the first time it encounters the ID for "Male" it will lazily fetch the data associated (like the word "Male") and put it in the spool. Next, because it's sorted it expects just a lot more males and to simply re-use what it has already put in the spool.
Basically, it plans to fetch the data from tables 1 and 2 in a few big chunks stopping once or twice to fetch new details from table 3. In practice the stopping isn't occasional. In fact, it may even be stopping for every single row because there are lots of different IDs here. The lazy spool is like going upstairs to get one small thing at a time. Good if you think you just need your wallet. Not so good if you're moving house, in which case you'll want a big box (the eager spool).
The likely reason that shrinking the size of the field in table3 helped is that it meant it estimated less of a comparative benefit in doing the lazy spool over a full sort up front. With varchar
it doesn't know how much data there is, just how much there could potentially be. The bigger the potential chunks of data that need shuffling, the more physical work needs doing.
Make your table schema and indexes reflect the real shape of the data.
varchar
in one table then it's very unlikely to need the extra characters available in nvarchar
for another table. Avoid the need for conversions on IDs and also use integers instead of characters where possible.Nudge in the right direction with join order.
If all else fails
INCLUDE
all the fields you'll need from table3 in the index on TradeReportId. The reason the indexes couldn't help so much already is that they make it easy to identify how to re-sort but it still hasn't been physically done. That is work it was hoping to optimize with a lazy spool but if the data were included it would be already available so no work to optimize.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With