Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To find infinite recursive loop in CTE

I'm not a SQL expert, but if anybody can help me.

I use a recursive CTE to get the values as below.

Child1 --> Parent 1

Parent1 --> Parent 2

Parent2 --> NULL

If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.

Child1 --> Parent 1

Parent1 --> Child1

or

Child1 --> Parent 1

Parent1 --> Parent2

Parent2 --> Child1

like image 482
Interstellar Avatar asked Jul 31 '15 06:07

Interstellar


People also ask

How does recursive CTE works in SQL Server?

A recursive CTE references itself. It returns the result subset, then it repeatedly (recursively) references itself, and stops when it returns all the results. FROM cte_name; Again, at the beginning of your CTE is the WITH clause.

How do I limit recursion on CTE?

There is no way to perform a recursion more than 32767, if you increase the value of MAXRECURSION more than 32767, as you will get an error. You can define the maximum number of recursions for CTE, using the MAXRECURSION option. Set the value of MAXRECURSION to 0, if you don't know the exact numbers of recursions.

What is recursive CTE in mysql?

A recursive CTE is a subquery which refer to itself using its own name. The recursive CTEs are defined using WITH RECURSIVE clause. There should be a terminating condition to recursive CTE. The recursive CTEs are used for series generation and traversal of hierarchical or tree-structured data.

How do I debug CTE in SQL?

The only way I know to debug the query in the CTE is to 1) replace the variables with values and execute the query or 2) comment everything after cteSales and add select * from cteSales . The latter is less uncomfortable, but both require changing lots of things from the original code.


3 Answers

With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.

Setup:

create table hierarchy (id integer, parent_id integer);

insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3), 
(5, 4), 
(3, 5); -- endless loop

Recursive query:

with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;

To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:

with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents, 
         id as root_id
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id, 
         p.root_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
     and c.root_id = p.root_id
)
select *
from tree;

Update for Postgres 14

Postgres 14 introduced the (standard compliant) CYCLE option to detect cycles:

with recursive tree as (
  select id, 
         parent_id
  from hierarchy
  where parent_id is null

  union all

  select c.id, 
         c.parent_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
)
cycle id -- track cycles for this column
   set is_cycle -- adds a boolean column is_cycle
   using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
like image 129
a_horse_with_no_name Avatar answered Oct 08 '22 19:10

a_horse_with_no_name


You haven't specified the dialect or your column names, so it is difficult to make the perfect example...

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 2, 'SubChild')
-- End random data

;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
    SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
    UNION ALL
    SELECT R.StartingID, R.Level + 1, 
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.*
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
    FROM RecursiveCTE 
    ORDER BY StartingID, Level

Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.

In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).

The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.

A little more complex, that generates a "better" report:

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 3, 'SubChild')
-- End random data

-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
    SELECT MT1.* FROM #MyTable MT1
        WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)

, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
    SELECT ID, -- StartingID 
        1, -- Level
        '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
        '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
        0, -- Loop
        ParentID
        FROM WithoutChildren
    UNION ALL
    SELECT R.StartingID, -- StartingID
        R.Level + 1, -- Level
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.ParentID
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT * FROM RecursiveCTE 
    WHERE ParentID IS NULL OR Loop = 1

This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.

like image 10
xanatos Avatar answered Oct 08 '22 21:10

xanatos


Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):

-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (

with cte(ancestor, child, lev, cycle) as (
  select id, id, 0, 0 from Table1
  union all
  select cte.ancestor
       , Table1.id
       , case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
       , case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
    from Table1
    join cte
      on cte.child = Table1.PARENT_ID
   where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
       ancestor
     , child
     , lev
     , max(cycle) over (partition by ancestor) cycle
  from cte

Given the following adjacency list for Table1:

| parent_id | id |
|-----------|----|
|    (null) |  1 |
|    (null) |  2 |
|         1 |  3 |
|         3 |  4 |
|         1 |  5 |
|         2 |  6 |
|         6 |  7 |
|         7 |  8 |
|         9 | 10 |
|        10 | 11 |
|        11 |  9 |

The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.

SQL(/DB) Fiddles demonstrating this in various DBs can be found below:

  • Oracle 11gR2
  • SQL Server 2017
  • PostgeSQL 9.5
  • MySQL 8
like image 6
Sentinel Avatar answered Oct 08 '22 19:10

Sentinel