Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CTE with recursion - row_number() aggregated

I have a table with a parent/child relationship along with a date created column referencing itself. I want to display each parent record and all the descendants ordered by the most recent 'activity' on the node. So if row number 1 which was created a long time ago has a new child added to it (or a new child added to one its children for example), then I want it at the top of the results.

I'm having trouble getting this to work currently.

My table structure is as follows:

CREATE TABLE [dbo].[Orders](
    [OrderId] [int] NOT NULL,
    [Orders_OrderId] [int] NULL,
    [DateOrdered] datetime)

I have written the following SQL to pull the information out:

WITH allOrders AS 
   (SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance,  
   row_number() over (order by DateOrdered desc) as RN1
    FROM orders po WHERE po.Orders_OrderId is null
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, 
    c.RN1
    FROM orders b2 
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    )

SELECT * from allOrders
where RN1 between 0 and 2
order by rn1 asc, distance asc

Is there any way I can 'aggregate' the results of the recursive selects, so that I could select the maximum date across an entire 'parent' node?

SQLFiddle demonstration: http://sqlfiddle.com/#!3/ca6cb/11 (record number 1 should be first as it has a child that was updated recently)

Update Thanks to the suggestion from @twrowsell I have the following query which does work but seems really clunky and has some performance issues, I feel I shouldn't have to have 3 CTE's to achieve this. Is there any way it can be condensed whilst preserving the 'row numbers' (as this is for a user display with paging)?

WITH allOrders AS 
  (SELECT po.orderid, po.Orders_OrderId, 0 as distance, po.DateOrdered, po.orderid as [rootId]
    FROM orders po WHERE po.Orders_OrderId is null 
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, c.distance + 1, b2.DateOrdered, c.[rootId]
    FROM orders b2     
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    ),
    mostRecentOrders as (
    SELECT *,
    MAX(DateOrdered) OVER (PARTITION BY rootId) as [HighestOrderId]
    from allOrders
    ),
    pagedOrders as (
    select *, dense_rank() over (order by [HighestOrderId] desc) as [PagedRowNumber] from mostRecentOrders)

    SELECT  * from pagedOrders
    where PagedRowNumber between 0 and 2
    order by [HighestOrderId] desc

Also, I could use MAX(orderid) as orderid is the ident and datecreated can't be updated in my scenario after its created.

Updated SQLFiddle: http://sqlfiddle.com/#!3/ca6cb/41

like image 757
Ed W Avatar asked Feb 05 '14 07:02

Ed W


4 Answers

First, you need to store the "root" order ID so that you can differentiate between different "trees" of orders. Once you have that, you can aggregate and sort your data.

As far as I can tell, you will need at least one CTE to build the tree and a second to do the rankings since you can't use DENSE_RANK() in a WHERE clause.

The following query uses a temp table to store the tree. The query selects from the tree twice, once for the rows and a second time for the rankings. If I used a CTE for storing the tree, it would have to build it twice, since the CTE is basically just a reusable subquery (it would be rebuilt for each distinct time it is used). Using a temp table ensures that I only need to build it once.

Here is the SQL:

DECLARE @Offset INT = 0;
DECLARE @Fetch INT = 2;

-- Create the Order Trees
WITH OrderTree AS (
  SELECT  po.orderid AS RootOrderID,
          po.orderid,
          po.Orders_OrderId,
          po.DateOrdered,
          0 AS distance
  FROM orders po WHERE po.Orders_OrderId IS NULL
  UNION/**/ALL
  SELECT  parent.RootOrderID,
          child.orderid,
          child.Orders_OrderId,
          child.DateOrdered,
          parent.distance + 1 AS distance
  FROM orders child
  INNER JOIN OrderTree parent
  ON child.Orders_OrderId = parent.orderid
)
SELECT *
INTO #OrderTree
FROM OrderTree;

-- Rank the order trees by MAX(DateOrdered)
WITH
Rankings AS (
    SELECT RootOrderID,
         MAX(DateOrdered) AS MaxDate,
         ROW_NUMBER() OVER(ORDER BY MAX(DateOrdered) DESC, RootOrderID ASC) AS Rank
  FROM #OrderTree
  GROUP BY RootOrderID
)
-- Get the next @Fetch trees, starting at rank @Offset+1
SELECT  TREE.*,
        R.MaxDate,
        R.Rank
FROM Rankings R
INNER JOIN #OrderTree TREE
    ON R.RootOrderID = TREE.RootOrderID
WHERE R.Rank BETWEEN @Offset+1 AND (@Fetch+@Offset)
ORDER BY R.Rank ASC, TREE.distance ASC;

SQLFiddle

Note: The /**/ between UNION and ALL is a workaround for this issue.

I built my own "orders" table using data in an existing table I had in my database and did some bench-marking against the 3-CTE query in the question. This slightly outperforms it on a large pool of data (117 trees with a total of 37215 orders, max depth of 11). I benchmarked by running each query with STATISTICS IO and STATISTICS TIME turned on, clearing the cache and buffers before each run.

Below are the results of both queries, plus the results of the recursive CTE shared by both:

╔════════════╦══════════╦════════════╦══════════════╗
║ Query      ║ CPU Time ║ Scan Count ║ Logical Reads║
╠════════════╬══════════╬════════════╬══════════════╣
║ Tree CTE   ║ 24211ms  ║ 4          ║ 1116243      ║
╟────────────╫──────────╫────────────╫──────────────╢
║ 3-CTE      ║ 24789ms  ║ 7          ║ 1192221      ║
║ Temp Table ║ 24384ms  ║ 6          ║ 1116549      ║
╚════════════╩══════════╩════════════╩══════════════╝

It appears that the bulk of both of these queries is the recursive order-tree CTE. Removing the shared cost of the recursive CTE gives the following results:

╔════════════╦══════════╦════════════╦══════════════╗
║ Query      ║ CPU Time ║ Scan Count ║ Logical Reads║
╠════════════╬══════════╬════════════╬══════════════╣
║ 3-CTE      ║ 578ms    ║ 3          ║ 75978        ║
║ Temp Table ║ 173ms    ║ 2          ║ 306          ║
╚════════════╩══════════╩════════════╩══════════════╝

Based on these results, I highly recommend you add a RootOrderID column to your orders table to avoid having to use a potentially very costly recursive CTE.

like image 101
Jon Senchyna Avatar answered Nov 03 '22 14:11

Jon Senchyna


Would using MAX on DateOrdered in an OVER clause in the outer select work..

    WITH allOrders AS 
(
    SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance,  
       row_number() over (order by DateOrdered desc) as RN1
    FROM orders po WHERE po.Orders_OrderId is null
    UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, 
      c.RN1
    FROM orders b2 
    INNER JOIN allOrders c 
    ON b2.Orders_OrderId = c.orderid
    )


    SELECT *,   MAX(DateOrdered) OVER (PARTITION BY Orders_OrderId) from allOrders
    where RN1 between 0 and 2
    order by rn1 asc, distance asc

EDIT: Sorry I misinterpreted your requirement first time. It looks like you want to partition your results by the RN1 field not Orders_OrderId so your outer select will be something like..

 SELECT MAX(DateOrdered) OVER (PARTITION BY RN1 ),*  from allOrders
where RN1 between 0 and 2
order by rn1 asc, distance asc
like image 35
twrowsell Avatar answered Nov 03 '22 12:11

twrowsell


Have a look at the following:

;WITH allOrders AS 
   (SELECT po.orderid, po.Orders_OrderId, po.DateOrdered, 0 as distance, po.orderid as [parentOrder]
    FROM orders po WHERE po.Orders_OrderId is null
        UNION ALL
    SELECT b2.orderid ,b2.Orders_OrderId, b2.DateOrdered, c.distance + 1, c.[parentOrder]
    FROM orders b2 
    INNER JOIN allOrders c ON b2.Orders_OrderId = c.orderid

    )
    SELECT a.OrderId
           ,a.Orders_OrderId
           ,a.DateOrdered
           ,top1.DateOrdered as HIghestDate
           ,a.distance
           ,a.parentOrder     
    FROM allOrders a
    INNER JOIN (SELECT TOP 2 parentOrder, MAX(DateOrdered)as highestdates FROM allOrders GROUP BY parentOrder ORDER BY MAX(DateOrdered)DESC)b on a.parentOrder=b.parentOrder 
    OUTER APPLY (SELECT TOP 1 parentOrder, DateOrdered FROM allOrders top1 WHERE a.parentOrder=top1.parentOrder ORDER BY top1.DateOrdered DESC)top1

SQLFiddle

like image 35
Kiril Rusev Avatar answered Nov 03 '22 14:11

Kiril Rusev


I'm having a hard time understanding your exact full needs, including the paging situation. You could provide the expected result set for the sample you supplied, it would be easier to check against.

Anyway, it seems your main difficulty is on this:

Is there any way I can 'aggregate' the results of the recursive selects, so that I could select the maximum date across an entire 'parent' node?

... this can be easily done with that recursive CTE and an APPLY.

I'm not sure of what do you want exactly, so I made these two fiddles:

SQL Fiddle 1 - Here all children are together based on the root order, i.e., order 3 is with its parent's (order 2) parent (order 1).

SQL Fiddle 2 - Here the children are grouped with their immediate parent, and also parents become roots, so order 2 doesn't get to the top with its parent (order 1).

I think you'll be going for some modification of the first one.

Again, it's really important to provide your expected result in questions like this one, or you'll get a lot of trial and error answers.

like image 27
Pedro Fialho Avatar answered Nov 03 '22 12:11

Pedro Fialho