Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TSQL - Recursive CTE inefficient - Need an alternative

Here is a table with sample data:

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

Something to note is that the B column is always the same as it's only used once in this calculation, but is needed for the initial calculation.

I am attempting to subtract B from A on the first row, then on subsequent rows subtract the previous rows difference from A. Effectively, B - A = C on the first then C - A on all subsequent rows FOR THE RELATED ItemID.

Here are the results I'm expecting:

ItemID  A   B   C   Month   RowNumber
1234    5   9   4   1       1
1234    6   9   -2  2       2
1234    1   9   -3  3       3
1324    14  6   -8  1       1
1324    5   6   -13 2       2
1324    9   6   -22 3       3
4321    5   11  6   1       1
4321    12  11  -6  2       2

Here is how I am accomplishing this.

;WITH CTE_TestValue AS (
    SELECT 
        Main.ItemID,
        Main.A,
        Main.B,
        Main.Month,
        ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
    FROM @TestTable AS Main
),
CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM CTE_TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM CTE_TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber - 1 = Sub.RowNumber
            AND MainB.ItemID = Sub.ItemID
--      CROSS JOIN CTE_TestColumnC AS Sub
--          WHERE Sub.RowNumber + 1 = MainB.RowNumber
--          AND MainB.ItemID = Sub.ItemID 
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

This works fine on a small data-sample, but I'm dealing with about 20,000 ItemId's each repeating 10 times. It finishes all the first row calculations instantly, as expected, and then the calculation times go up DRASTICALLY.

As you can see I've tried both an INNER JOIN and a CROSS JOIN. I believe they have the same execution plan with the parameters that I've given the CROSS JOIN.

Is there a more effective/efficient way to accomplish this?

I allowed this to run for 5 hours yesterday to see if it ever ended.. it did not.

Another note: When I'm using this on the test data I SELECT WITHOUT using ORDER to hopefully help speed things along. The ORDER is just for my convenience when I'm fact checking.

like image 907
jayEss Avatar asked Oct 10 '12 19:10

jayEss


1 Answers

Your problem is that you are using a CTE as the source of a recursive CTE. Your first CTE will be executed once for each iteration of your recursive CTE. With your test data that means that CTE_TestValue is created 8 times.

Put the result of CTE_TestValue in a temp table that has a clustered primary key on (RowNumber, ItemID) and use that temporary table as the source of data for the recursive CTE CTE_TestColumnC.

Also change the join condition in the recursive part to ON MainB.RowNumber = Sub.RowNumber + 1. That will make the query able to use the index on the temporary table.

DECLARE @TestTable TABLE (
    ItemID INT,
    A INT,
    B INT,
    Month INT)

INSERT INTO @TestTable VALUES (1234, 5, 9, 1)
INSERT INTO @TestTable VALUES (1234, 6, 9, 2)
INSERT INTO @TestTable VALUES (4321, 5, 11, 1)
INSERT INTO @TestTable VALUES (4321, 12, 11, 2)
INSERT INTO @TestTable VALUES (1324, 14, 6, 1)
INSERT INTO @TestTable VALUES (1324, 5, 6, 2)
INSERT INTO @TestTable VALUES (1234, 1, 9, 3)
INSERT INTO @TestTable VALUES (1324, 9, 6, 3)

CREATE TABLE #TestValue
(
  ItemID INT,
  A INT,
  B INT,
  Month INT,
  RowNumber INT,
  primary key(RowNumber, ItemID)
)

INSERT INTO #TestValue
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.Month,
    ROW_NUMBER() OVER (Partition BY Main.ItemID ORDER BY Main.Month) AS RowNumber
FROM @TestTable AS Main


;WITH CTE_TestColumnC AS (
    SELECT 
        MainA.ItemID,
        MainA.A,
        MainA.B,
        (MainA.B - MainA.A) AS C,
        MainA.Month,
        MainA.RowNumber
    FROM #TestValue AS MainA
        WHERE MainA.Rownumber = 1

    UNION ALL

    SELECT 
        MainB.ItemID,
        MainB.A,
        MainB.B,
        (Sub.C - MainB.A) AS C,
        MainB.Month,
        MainB.RowNumber
    FROM #TestValue AS MainB
        INNER JOIN CTE_TestColumnC AS Sub
            ON MainB.RowNumber = Sub.RowNumber + 1
            AND MainB.ItemID = Sub.ItemID
)
SELECT 
    Main.ItemID,
    Main.A,
    Main.B,
    Main.C,
    Main.Month,
    Main.RowNumber
FROM CTE_TestColumnC AS Main
ORDER BY ItemID, Month, RowNumber

DROP TABLE #TestValue

In the query plan for your query the problem is shown in the table scan in the lower right corner. with this test data it is executed 8 times with a total of 64 rows returned:

enter image description here

The query plans for the query with a temporary table: enter image description hereenter image description here

like image 170
Mikael Eriksson Avatar answered Sep 19 '22 11:09

Mikael Eriksson