I have the sql query below that is running very slowly. I took a look at the execution plan and it is claiming that a sort on Files.OrderId is the highest cost operation (53%). Why would this be happening if I am not ordering by OrderId anywhere? Is my best bet to create an index on File.OrderId?
Execution plan if anyone is interested.
with custOrders as ( SELECT c.firstName + ' ' + c.lastname as Customer, c.PartnerId , c.CustomerId,o.OrderId,o.CreateDate, c.IsPrimary FROM Customers c LEFT JOIN CustomerRelationships as cr ON c.CustomerId = cr.PrimaryCustomerId INNER JOIN Orders as o ON c.customerid = o.customerid OR (cr.secondarycustomerid IS NOT NULL AND o.customerid = cr.secondarycustomerid) where c.createdate >= @FromDate + ' 00:00' AND c.createdate <= @ToDate + ' 23:59' ), temp as ( SELECT Row_number() OVER ( ORDER BY c.createdate DESC) AS 'row_number', c.customerid as customerId, c.partnerid as partnerId, c.Customer, c.orderid as OrderId, c.createdate as CreateDate, Count(f.orderid) AS FileCount, dbo.Getparentcustomerid(c.isprimary, c.customerid) AS ParentCustomerId, au.firstname + ' ' + au.lastname AS Admin, '' as blank, 0 as zero FROM custOrders c INNER JOIN files f ON c.orderid = f.orderid INNER JOIN admincustomers ac ON c.customerid = ac.customerid INNER JOIN adminusers au ON ac.adminuserid = au.id INNER JOIN filestatuses s ON f.statusid = s.statusid WHERE ac.adminuserid IS NOT NULL AND f.statusid NOT IN ( 5, 6 ) GROUP BY c.customerid, c.partnerid, c.Customer, c.isprimary, c.orderid, c.createdate, au.firstname, au.lastname )
SQL Server Sort Operator If you move the mouse to point to the SORT operator, you will see that the output of the SORT operator is the same input columns but sorted by the specified column, as in the Tooltip shown below: The SORT operator is an expensive operator as you can see from the SQL Server execution plan.
Window Functions There's a clustered index on the table defined with orderid, ascending, as the key. This plan uses an ordered-forward scan in the index to provide the window function with the rows in the order that it needs for the computation, and therefore there's no need for explicit sorting.
A plan change can occur due for a variety of reasons including but not limited to the following types of changes occurring in the system: optimizer version, optimizer statistics, optimizer parameters, schema/metadata definitions, system settings, as well as SQL profile creation.
SQL Server has three algorithms to choose from when it needs to join two tables. The Nested-Loops-Join, the Hash-Join and the Sort-Merge-Join. Which one it selects it bases on cost estimates. In this case it figured, that based on the information it had available a Sort-Merge-Join was the right choice.
In SQL Server execution plans a Sort-Merge is splitt into two operators, the Sort and the Merge-Join, because the sort operation might not be necessary, for example if the data is sorted already.
For mor information about joins check out my join series here: http://sqlity.net/en/1146/a-join-a-day-introduction/ The article about the Sort-Merg-Join is here: http://sqlity.net/en/1480/a-join-a-day-the-sort-merge-join/
To make your query faster, I first would look at indexes. You have a bunch of clustered index scans in the query. If you can replace a few of them with seeks you will be most likely better of. Also check if the estimates that SQL Server produces match the actual row counts in an actual execution plan. If they are far off, SQL Server often makes bad choices. So providing better statistics can help you query performance too.
SQL Server is performing the sort to enable the merge join between the dataset to the right of that sort operator and the records in the Orders
table. Merge join itself is a very efficient way to join all the records in a dataset, but it requires that each dataset to be joined is sorted according to the join keys and in the same order.
Since the PK_Orders
key is already ordered by OrderID
, SQL Server decided to take advantage of that by sorting the other end of the join (the other stuff to the right of the sort) so that the two datasets can be merged together at that point in the plan. The common alternative to merge join is a hash join, but that wouldn't help you because you would instead have an expensive hash join operator instead of the sort and merge. The query optimizer has determined the sort and merge to be more efficient in this case.
The root cause of the expensive step in the plan is the need to combine all the records from the orders table into the dataset. Is there a way to limit the records coming from the files
table? An index on files.statusid
may be helpful if the records not in 5,6 are less than 10% of the total table size.
The QO thinks that most of the records are going to be filtered out at the end. Try to push as many of those filter conditions back to the record sources so that less records have to be handled in the middle of the plan.
EDIT: I forgot to mention, it is very helpful to have an execution plan that we can look at. Is there any way we can get an actual execution plan result to see the real number of records going through those operators? Sometimes the estimated record counts can be a little off.
EDIT: Looking deeper into the 2nd to last filter operator's predicate field, summarized:
c.CustomerId=o.CustomerId OR o.CustomerId=cr.SecondaryCustomerId AND cr.SecondaryCustomerId IS NOT NULL
Looks like SQL Server is producing a cross join between all possible matching records between Orders
and Customers
up to this point in the query (the plan on the right of the 2nd to last filter operator) and then looking at each record with that condition to see if it does indeed match. Notice how the line going into the filter is really fat and the line coming out is really thin? That's because the estimated row count goes from 21k to 4 after that operator. Forget what I said earlier, this is probably the main problem in the plan. Even if there are indexes on these columns, SQL Server can't use them because the join condition is too complex. It's causing the plan to merge all the records together instead of seeking to just the ones you need because it can't use the full join predicate right away.
My first thought is to rephrase the CTE custOrders
as a union of two datasets: one using CustomerId
and one using SecondaryCustomerId
to join. This will duplicate the work of the rest of the CTE but if it enables proper use of the indexes, it could be a big win.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With