I have posted something similar before, but I am approaching this from a different direction now so I opened a new question. I hope this is OK.
I have been working with a CTE that creates a sum of charges based on a Parent Charge. The SQL and details can be seen here:
CTE Index recommendations on multiple keyed table
I don't think I am missing anything on the CTE, but I am getting a problem when I use it with a big table of data (3.5 million rows).
The table tblChargeShare
contains some other information that I need, such as an InvoiceID
, so I placed my CTE in a view vwChargeShareSubCharges
and joined it to the table.
The query:
Select t.* from vwChargeShareSubCharges t
inner join
tblChargeShare s
on t.CustomerID = s.CustomerID
and t.MasterChargeID = s.ChargeID
Where s.ChargeID = 1291094
Returns a result in a few ms.
The query:
Select ChargeID from tblChargeShare Where InvoiceID = 1045854
Returns 1 row:
1291094
But the query:
Select t.* from vwChargeShareSubCharges t
inner join
tblChargeShare s
on t.CustomerID = s.CustomerID
and t.MasterChargeID = s.ChargeID
Where InvoiceID = 1045854
Takes 2-3 minutes to run.
I saved the execution plans and loaded them into SQL Sentry. The Tree for the fast query looks like this:
The plan from the slow query is:
I have tried reindexing, running the query through tuning advisor and various combinations of sub queries. Whenever the join contains anything other than the PK, the query is slow.
I had a similar question here:
SQL Server Query time out depending on Where Clause
Which used functions to do the summimg of child rows instead of a CTE. This is the rewrite using CTE to try and avoid the same problem I am now experiencing. I have read the responses in that answer but I am none the wiser - I read some information about hints and parameters but I can't make it work. I had thought that rewriting using a CTE would solve my problem. The query is fast when running on a tblCharge with a few thousand rows.
Tested in both SQL 2008 R2 and SQL 2012
Edit:
I have condensed the query into a single statement, but the same issue persists:
WITH RCTE AS
(
SELECT ParentChargeId, s.ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount, ISNULL(s.TaxAmount, 0) as TaxAmount,
ISNULL(s.DiscountAmount, 0) as DiscountAmount, s.CustomerID, c.ChargeID as MasterChargeID
from tblCharge c inner join tblChargeShare s
on c.ChargeID = s.ChargeID Where s.ChargeShareStatusID < 3 and ParentChargeID is NULL
UNION ALL
SELECT c.ParentChargeID, c.ChargeID, Lvl+1 AS Lvl, ISNULL(s.TotalAmount, 0), ISNULL(s.TaxAmount, 0), ISNULL(s.DiscountAmount, 0) , s.CustomerID
, rc.MasterChargeID
from tblCharge c inner join tblChargeShare s
on c.ChargeID = s.ChargeID
INNER JOIN RCTE rc ON c.PArentChargeID = rc.ChargeID and s.CustomerID = rc.CustomerID Where s.ChargeShareStatusID < 3
)
Select MasterChargeID as ChargeID, rcte.CustomerID, Sum(rcte.TotalAmount) as TotalCharged, Sum(rcte.TaxAmount) as TotalTax, Sum(rcte.DiscountAmount) as TotalDiscount
from RCTE inner join tblChargeShare s on rcte.ChargeID = s.ChargeID and RCTE.CustomerID = s.CustomerID
Where InvoiceID = 1045854
Group by MasterChargeID, rcte.CustomerID
GO
Edit: More playing around,I just don't understand this.
This query is instant (2ms):
Select t.* from
vwChargeShareSubCharges t
Where t.MasterChargeID = 1291094
Whereas this takes 3 minutes:
DECLARE @ChargeID int = 1291094
Select t.* from
vwChargeShareSubCharges t
Where t.MasterChargeID = @ChargeID
Even if I put heaps of numbers in an "In", the query is still instant:
Where t.MasterChargeID in (1291090, 1291091, 1291092, 1291093, 1291094, 1291095, 1291096, 1291097, 1291098, 1291099, 129109)
Edit 2:
I can replicate this from scratch using this example data:
I have created some dummy data to replicate the issue. It isn't so significant, as I only added 100,000 rows, but the bad execution plan still happens (run in SQLCMD mode):
CREATE TABLE [tblChargeTest](
[ChargeID] [int] IDENTITY(1,1) NOT NULL,
[ParentChargeID] [int] NULL,
[TotalAmount] [money] NULL,
[TaxAmount] [money] NULL,
[DiscountAmount] [money] NULL,
[InvoiceID] [int] NULL,
CONSTRAINT [PK_tblChargeTest] PRIMARY KEY CLUSTERED
(
[ChargeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO
Insert into tblChargeTest
(discountAmount, TotalAmount, TaxAmount)
Select ABS(CHECKSUM(NewId())) % 10, ABS(CHECKSUM(NewId())) % 100, ABS(CHECKSUM(NewId())) % 10
GO 100000
Update tblChargeTest
Set ParentChargeID = (ABS(CHECKSUM(NewId())) % 60000) + 20000
Where ChargeID = (ABS(CHECKSUM(NewId())) % 20000)
GO 5000
CREATE VIEW [vwChargeShareSubCharges] AS
WITH RCTE AS
(
SELECT ParentChargeId, ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount, ISNULL(TaxAmount, 0) as TaxAmount,
ISNULL(DiscountAmount, 0) as DiscountAmount, ChargeID as MasterChargeID
FROM tblChargeTest Where ParentChargeID is NULL
UNION ALL
SELECT rh.ParentChargeID, rh.ChargeID, Lvl+1 AS Lvl, ISNULL(rh.TotalAmount, 0), ISNULL(rh.TaxAmount, 0), ISNULL(rh.DiscountAmount, 0)
, rc.MasterChargeID
FROM tblChargeTest rh
INNER JOIN RCTE rc ON rh.PArentChargeID = rc.ChargeID --and rh.CustomerID = rc.CustomerID
)
Select MasterChargeID, ParentChargeID, ChargeID, TotalAmount, TaxAmount, DiscountAmount , Lvl
FROM RCTE r
GO
Then run these two queries:
--Slow Query:
Declare @ChargeID int = 60900
Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = @ChargeID
--Fast Query:
Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = 60900
As you said that 'there is no difference between these two query in terms of performance', so I think in your simple example LEFT JOIN and CTE might have the same performance. I think the biggest benefit for using CTEs is readability.
The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times.
Looking at SQL Profiler results from these queries (each were run 10 times and averages are below) we can see that the CTE just slightly outperforms both the temporary table and table variable queries when it comes to overall duration.
Disadvantages of CTECTE members are unable to use the keyword clauses like Distinct, Group By, Having, Top, Joins, etc. The CTE can only be referenced once by the Recursive member. We cannot use the table variables and CTEs as parameters in stored procedures.
The best SQL Server can do for you here is to push the filter on ChargeID
down into the anchor part of the recursive CTE inside the view. That allows a seek to find the only row you need to build the hierarchy from. When you provide the parameter as a constant value SQL Server can make that optimization (using a rule called SelOnIterator
, for those who are interested in that sort of thing):
When you use a local variable it can not do this, so the predicate on ChargeID
gets stuck outside the view (which builds the full hierarchy starting from all NULL
ids):
One way to get the optimal plan when using a variable is to force the optimizer to compile a fresh plan on every execution. The resulting plan is then tailored to the specific value in the variable at execution time. This is achieved by adding an OPTION (RECOMPILE)
query hint:
Declare @ChargeID int = 60900;
-- Produces a fast execution plan, at the cost of a compile on every execution
Select *
from [vwChargeShareSubCharges]
Where MasterChargeID = @ChargeID
OPTION (RECOMPILE);
A second option is to change the view into an inline table function. This allows you to specify the position of the filtering predicate explicitly:
CREATE FUNCTION [dbo].[udfChargeShareSubCharges]
(
@ChargeID int
)
RETURNS TABLE AS RETURN
(
WITH RCTE AS
(
SELECT ParentChargeID, ChargeID, 1 AS Lvl, ISNULL(TotalAmount, 0) as TotalAmount, ISNULL(TaxAmount, 0) as TaxAmount,
ISNULL(DiscountAmount, 0) as DiscountAmount, ChargeID as MasterChargeID
FROM tblChargeTest
Where ParentChargeID is NULL
AND ChargeID = @ChargeID -- Filter placed here explicitly
UNION ALL
SELECT rh.ParentChargeID, rh.ChargeID, Lvl+1 AS Lvl, ISNULL(rh.TotalAmount, 0), ISNULL(rh.TaxAmount, 0), ISNULL(rh.DiscountAmount, 0)
, rc.MasterChargeID
FROM tblChargeTest rh
INNER JOIN RCTE rc ON rh.ParentChargeID = rc.ChargeID --and rh.CustomerID = rc.CustomerID
)
Select MasterChargeID, ParentChargeID, ChargeID, TotalAmount, TaxAmount, DiscountAmount , Lvl
FROM RCTE r
)
Use it like this:
Declare @ChargeID int = 60900
select *
from dbo.udfChargeShareSubCharges(@ChargeID)
The query can also benefit from an index on ParentChargeID
.
create index ix_ParentChargeID on tblChargeTest(ParentChargeID)
Here is another answer about a similar optimization rule in a similar scenario. Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With