Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find topmost parents in self-referential table

Tags:

sql

sql-server

Given a number of IDs, each possibly connected to other entries in the same table, I'd like to find their top level parents. I.e. those rows for which the parent ID is NULL. So below each red unit would be connected up the hierarchy to a corresponding green one.

Hierarchy

Borrowing from this answer in a very similar question, here's a toy schema.

DECLARE @t TABLE (ID INT, link INT)

INSERT INTO @t VALUES
(1, NULL),
(2, 1),
(3, 2),
(4, 3),
(5, 3),
(6, 2),
(7, 1),

(8, NULL),
(9, 8),
(10, 9),
(11, 9),
(12, 9),
(13, 12),
(14, 12),
(15, 8);

I would have a set of IDs, 6 and 13 for an example with two bottom-level nodes. Then I'd want a result set like (6, 1) and (13, 8). To construct every link all the way up, the answer suggested a common table expression.

WITH cte AS (
    SELECT ID, link
    FROM @t
    WHERE ID IN (6, 13)

    UNION ALL

    SELECT t.ID, t.link
    FROM @t t
    JOIN cte c ON t.ID = c.link
    WHERE t.link IS NOT NULL
)

SELECT *
FROM cte

Which yields this result:

 ID | link
----+------
  6 |   2
 13 |  12
 12 |   9
  9 |   8
  2 |   1

However, I'm not sure how could I combine this into one result for each starting point. For one ID, I could perhaps choose the last row of the result set and get the link ID, but not for multiple. Mind you, there can naturally be multiple top-level parents (although branching occurs only down, so only one parent for a given node), and mid-level entries can be chosen as starting points as well.

Instead of a UNION ALL I naïvely tried to JOIN, but it turned out such CTEs are not allowed.

Here are all the nodes colored red above: (3, 6, 11, 13, 15). They should map to (1, 1, 8, 8, 8).

like image 911
Felix Avatar asked Aug 05 '19 09:08

Felix


3 Answers

2 issues with the code:

  1. You need to track your starting ID through the recursion;
  2. The where condition in the recursive part is actually preventing you from getting the result.

As such:

WITH cte AS (
    SELECT ID, link, ID as [StartID]
    FROM @t
    WHERE ID IN (6, 7)
    UNION ALL
    SELECT t.ID, t.link, c.StartID
    FROM @t t
    JOIN cte c ON t.ID = c.link
)
SELECT c.StartID, c.ID
FROM cte c
where c.link is null;
like image 168
Roger Wolf Avatar answered Nov 12 '22 01:11

Roger Wolf


Remember the starting point, track level, take max level.

DECLARE @t TABLE (ID INT, link INT)

INSERT INTO @t VALUES
(1, NULL),
(2, 1),
(3, 1),
(4, 2),
(5, 3),
(6, 4),
(7, 5);

WITH cte AS (
    SELECT ID, link, 1 as [level], id as [start]
    FROM @t
    WHERE ID IN (6, 7)

    UNION ALL

    SELECT t.ID, t.link, c.[level] + 1, c.[start]
    FROM @t t
    JOIN cte c ON t.ID = c.link
    WHERE t.link IS NOT NULL
)

SELECT TOP(1) WITH TIES [start], link
FROM cte
ORDER BY [level] DESC;
like image 1
Serg Avatar answered Nov 12 '22 01:11

Serg


You can tweak your query to get what you want:

WITH cte AS (
    SELECT ID as orig_id, ID, link, 1 as lev
    FROM @t
    WHERE ID IN (6, 7)

    UNION ALL

    SELECT c.orig_id, t.ID, t.link, lev + 1
    FROM @t t JOIN
         cte c
         ON t.ID = c.link
    WHERE t.link IS NOT NULL
)
SELECT orig_id, link
FROM (SELECT c.*, MAX(lev) OVER (PARTITION BY orig_id) as max_lev
      FROM cte c
     )  c
WHERE lev = max_lev;

The changes are:

  • Including the original id in the CTE.
  • Keeping track of the level ("height") of the relationship.
  • Choosing the highest level for each of the original ids.
like image 1
Gordon Linoff Avatar answered Nov 12 '22 03:11

Gordon Linoff