I have the sql query below that is running very slowly. I took a look at the execution plan and it is claiming that a sort on Files.OrderId is the highest cost operation (53%). Why would this be happening if I am not ordering by OrderId anywhere? Is my best bet to create an index on File.OrderId? Execution plan if anyone is interested. <pre class="prettyprint"><code>with custOrders as ( SELECT c.firstName + ' ' + c.lastname as Customer, c.PartnerId , c.CustomerId,o.OrderId,o.CreateDate, c.IsPrimary FROM Customers c LEFT JOIN CustomerRelationships as cr ON c.CustomerId = cr.PrimaryCustomerId INNER JOIN Orders as o ON c.customerid = o.customerid OR (cr.secondarycustomerid IS NOT NULL AND o.customerid = cr.secondarycustomerid) where c.createdate >= @FromDate + ' 00:00' AND c.createdate <= @ToDate + ' 23:59' ), temp as ( SELECT Row_number() OVER ( ORDER BY c.createdate DESC) AS 'row_number', c.customerid as customerId, c.partnerid as partnerId, c.Customer, c.orderid as OrderId, c.createdate as CreateDate, Count(f.orderid) AS FileCount, dbo.Getparentcustomerid(c.isprimary, c.customerid) AS ParentCustomerId, au.firstname + ' ' + au.lastname AS Admin, '' as blank, 0 as zero FROM custOrders c INNER JOIN files f ON c.orderid = f.orderid INNER JOIN admincustomers ac ON c.customerid = ac.customerid INNER JOIN adminusers au ON ac.adminuserid = au.id INNER JOIN filestatuses s ON f.statusid = s.statusid WHERE ac.adminuserid IS NOT NULL AND f.statusid NOT IN ( 5, 6 ) GROUP BY c.customerid, c.partnerid, c.Customer, c.isprimary, c.orderid, c.createdate, au.firstname, au.lastname ) </code></pre>

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the <code>Orders</code> table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order. Since the <code>PK_Orders</code> key is already ordered by <code>OrderID</code>, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case. The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the <code>files</code> table? An index on <code>files.statusid</code> may be helpful if the records not in 5,6 are less than 10% of the total table size. The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan. EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off. EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized: <pre class="prettyprint"><code>c.CustomerId=o.CustomerId OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL </code></pre> Looks like SQL Server is producing a cross join between all possible matching records between <code>Orders</code> and <code>Customers</code> up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away. My first thought is to rephrase the CTE <code>custOrders</code> as a union of two datasets: one using <code>CustomerId</code> and one using <code>SecondaryCustomerId</code> to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

Why is there a sort showing up in my execution plan?

Tags:

I have the sql query below that is running very slowly. I took a look at the execution plan and it is claiming that a sort on Files.OrderId is the highest cost operation (53%). Why would this be happening if I am not ordering by OrderId anywhere? Is my best bet to create an index on File.OrderId?

Execution plan if anyone is interested.

with custOrders as (     SELECT c.firstName + ' ' + c.lastname as Customer, c.PartnerId , c.CustomerId,o.OrderId,o.CreateDate, c.IsPrimary     FROM Customers c     LEFT JOIN CustomerRelationships as cr         ON c.CustomerId = cr.PrimaryCustomerId     INNER JOIN Orders as o        ON c.customerid = o.customerid             OR (cr.secondarycustomerid IS NOT NULL AND o.customerid = cr.secondarycustomerid)     where c.createdate >= @FromDate + ' 00:00'         AND c.createdate <= @ToDate + ' 23:59'  ),  temp as ( SELECT Row_number()           OVER (             ORDER BY c.createdate DESC)                    AS 'row_number',         c.customerid as customerId,         c.partnerid as partnerId,         c.Customer,         c.orderid as OrderId,         c.createdate as CreateDate,         Count(f.orderid)                                   AS FileCount,         dbo.Getparentcustomerid(c.isprimary, c.customerid) AS ParentCustomerId,         au.firstname + ' ' + au.lastname                   AS Admin,         '' as blank,         0  as zero FROM   custOrders c         INNER JOIN files f                 ON c.orderid = f.orderid         INNER JOIN admincustomers ac                 ON c.customerid = ac.customerid         INNER JOIN adminusers au                 ON ac.adminuserid = au.id         INNER JOIN filestatuses s                 ON f.statusid = s.statusid  WHERE  ac.adminuserid IS NOT NULL         AND f.statusid NOT IN ( 5, 6 )  GROUP  BY c.customerid,            c.partnerid,            c.Customer,            c.isprimary,            c.orderid,            c.createdate,            au.firstname,            au.lastname  )

318

asked Jan 08 '13 16:01

Abe Miessler

2 Answers

SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.

In SQL Server execution plans a Sort-Merge is splitt into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.

For mor information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/

To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better of. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help you query performance too.

answered Sep 18 '22 08:09

Sebastian Meine

SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.

Since the PK_Orders key is already ordered by OrderID, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.

The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files table? An index on files.statusid may be helpful if the records not in 5,6 are less than 10% of the total table size.

The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.

EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.

EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:

c.CustomerId=o.CustomerId OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL

Looks like SQL Server is producing a cross join between all possible matching records between Orders and Customers up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.

My first thought is to rephrase the CTE custOrders as a union of two datasets: one using CustomerId and one using SecondaryCustomerId to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.

answered Sep 21 '22 08:09

Chris Smith

Related questions
                            
                                JavaScript: access variables inside anonymous function from the outside
                            
                                Render a JBuilder view in html view
                            
                                When should I commit with SQLAlchemy using a for loop?
                            
                                What are the chances of losing a UDP packet?
                            
                                Connecting to a Redshift cluster from pgAdmin
                            
                                Decimal.Round default setting for MidpointRounding [duplicate]
                            
                                Will a Python generator be garbage collected if it will not be used any more but hasn't reached StopIteration yet?
                            
                                C++11 lambdas and the square brackets [duplicate]
                            
                                Type punning with void * without breaking the strict aliasing rule in C99
                            
                                Clarification of the leading dimension in CUBLAS when transposing
                            
                                Saving a stream while playing it using LibVLC
                            
                                Android ADT version 22, R.java files not generated

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With