Here is a table with sample data:
DECLARE @TestTable TABLE (
ItemID INT,
A INT,
B INT,
Month INT)
INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)
Something to note is that the B column is always the same as it's only used once in this calculation, but is needed for the initial calculation.
I am attempting to subtract B from A on the first row, then on subsequent rows subtract the previous rows difference from A. Effectively, B - A = C
on the first then C - A
on all subsequent rows FOR THE RELATED ItemID.
Here are the results I'm expecting:
ItemID A B C Month RowNumber
1234 5 9 4 1 1
1234 6 9 -2 2 2
1234 1 9 -3 3 3
1324 14 6 -8 1 1
1324 5 6 -13 2 2
1324 9 6 -22 3 3
4321 5 11 6 1 1
4321 12 11 -6 2 2
Here is how I am accomplishing this.
;WITH CTE_TestValue AS (
SELECT
Main.ItemID,
Main.A,
Main.B,
Main.Month,
ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
FROM @TestTable AS Main
),
CTE_TestColumnC AS (
SELECT
MainA.ItemID,
MainA.A,
MainA.B,
(MainA.B - MainA.A) AS C,
MainA.Month,
MainA.RowNumber
FROM CTE_TestValue AS MainA
WHERE MainA.Rownumber = 1
UNION ALL
SELECT
MainB.ItemID,
MainB.A,
MainB.B,
(Sub.C - MainB.A) AS C,
MainB.Month,
MainB.RowNumber
FROM CTE_TestValue AS MainB
INNER JOIN CTE_TestColumnC AS Sub
ON MainB.RowNumber - 1 = Sub.RowNumber
AND MainB.ItemID = Sub.ItemID
-- CROSS JOIN CTE_TestColumnC AS Sub
-- WHERE Sub.RowNumber + 1 = MainB.RowNumber
-- AND MainB.ItemID = Sub.ItemID
)
SELECT
Main.ItemID,
Main.A,
Main.B,
Main.C,
Main.Month,
Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber
This works fine on a small data-sample, but I'm dealing with about 20,000 ItemId's each repeating 10 times. It finishes all the first row calculations instantly, as expected, and then the calculation times go up DRASTICALLY.
As you can see I've tried both an INNER JOIN
and a CROSS JOIN
. I believe they have the same execution plan with the parameters that I've given the CROSS JOIN
.
Is there a more effective/efficient way to accomplish this?
I allowed this to run for 5 hours yesterday to see if it ever ended.. it did not.
Another note: When I'm using this on the test data I SELECT
WITHOUT using ORDER
to hopefully help speed things along. The ORDER
is just for my convenience when I'm fact checking.
Your problem is that you are using a CTE as the source of a recursive CTE. Your first CTE will be executed once for each iteration of your recursive CTE. With your test data that means that CTE_TestValue
is created 8 times.
Put the result of CTE_TestValue
in a temp table that has a clustered primary key on (RowNumber, ItemID)
and use that temporary table as the source of data for the recursive CTE CTE_TestColumnC
.
Also change the join condition in the recursive part to ON MainB.RowNumber = Sub.RowNumber + 1
. That will make the query able to use the index on the temporary table.
DECLARE @TestTable TABLE (
ItemID INT,
A INT,
B INT,
Month INT)
INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)
CREATE TABLE #TestValue
(
ItemID INT,
A INT,
B INT,
Month INT,
RowNumber INT,
primary key(RowNumber, ItemID)
)
INSERT INTO #TestValue
SELECT
Main.ItemID,
Main.A,
Main.B,
Main.Month,
ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
FROM @TestTable AS Main
;WITH CTE_TestColumnC AS (
SELECT
MainA.ItemID,
MainA.A,
MainA.B,
(MainA.B - MainA.A) AS C,
MainA.Month,
MainA.RowNumber
FROM #TestValue AS MainA
WHERE MainA.Rownumber = 1
UNION ALL
SELECT
MainB.ItemID,
MainB.A,
MainB.B,
(Sub.C - MainB.A) AS C,
MainB.Month,
MainB.RowNumber
FROM #TestValue AS MainB
INNER JOIN CTE_TestColumnC AS Sub
ON MainB.RowNumber = Sub.RowNumber + 1
AND MainB.ItemID = Sub.ItemID
)
SELECT
Main.ItemID,
Main.A,
Main.B,
Main.C,
Main.Month,
Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber
DROP TABLE #TestValue
In the query plan for your query the problem is shown in the table scan in the lower right corner. with this test data it is executed 8 times with a total of 64 rows returned:
The query plans for the query with a temporary table:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With