Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Hierarchy - Resolve full path for all ancestors of a given node

I have a hierarchy described by an adjacency list. There is not necessarily a single root element, but I do have data to identify the leaf (terminal) items in the hiearchy. So, a hierachy that looked like this ...

1
- 2
- - 4
- - - 7
- 3
- - 5
- - 6 
8
- 9

... would be described by a table, like this. NOTE: I don't have the ability to change this format.

id  parentid isleaf
--- -------- ------
1   null     0
2   1        0
3   1        0
4   2        0
5   3        1
6   3        1
7   4        1
8   null     0
9   8        1

here is the sample table definition and data:

CREATE TABLE [dbo].[HiearchyTest](
    [id] [int] NOT NULL,
    [parentid] [int] NULL,
    [isleaf] [bit] NOT NULL
)
GO

INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (1, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (2, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (3, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (4, 2, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (5, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (6, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (7, 4, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (8, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (9, 8, 1)
GO

From this, I need to provide any id and get a list of all ancestors including all descendents of each. So, if I provided the input of id = 6, I would expect the following:

id descendentid
-- ------------
1  1
1  3
1  6
3  3
3  6
6  6
  • id 6 just has itself
  • its parent, id 3 would have decendents of 3 and 6
  • its parent, id 1 would have decendents of 1, 3, and 6

I will be using this data to provide roll-up calculations at each level in the hierarchy. This works well, assuming I can get the dataset above.

I have accomplished this using two recusive ctes - one to get the "terminal" item for each node in the hiearchy. Then, a second one where I get the full ancestory of my selected node (so, 6 resolves to 6, 3, 1) to walk up and get the full set. I'm hoping that I'm missing something and that this can be accomplished in one round. Here is the example double-recursion code:

declare @test int = 6;

with cte as (

    -- leaf nodes
    select id, parentid, id as terminalid
    from HiearchyTest
    where isleaf = 1

    union all

    -- walk up - preserve "terminal" item for all levels
    select h.id, h.parentid, c.terminalid
    from HiearchyTest as h
    inner join
    cte as c on h.id = c.parentid

)

, cte2 as (

    -- get all ancestors of our test value
    select id, parentid, id as descendentid
    from cte
    where terminalid = @test 

    union all

    -- and walkup from each to complete the set
    select h.id, h.parentid, c.descendentid
    from HiearchyTest h
    inner join cte2 as c on h.id = c.parentid

)

-- final selection - order by is just for readability of this example
select id, descendentid 
from cte2
order by id, descendentid

Additional detail: the "real" hierarchy will be much larger than the example. It can technically have infinite depth, but realistically it would rarely go more than 10 levels deep.

In summary, my question is if I can accomplish this with a single recursive cte instead of having to recurse over the hierarchy twice.

like image 304
snow_FFFFFF Avatar asked Jun 16 '16 21:06

snow_FFFFFF


3 Answers

Because your data is a tree structure, we can use the hierarchyid data type to meet your needs (despite your saying that you can't in the comments). First, the easy part - generating the hierarchyid with a recursive cte

with cte as (

    select id, parentid, 
       cast(concat('/', id, '/') as varchar(max)) as [path]
    from [dbo].[HiearchyTest]
    where ParentID is null

    union all

    select child.id, child.parentid, 
       cast(concat(parent.[path], child.id, '/') as varchar(max))
    from [dbo].[HiearchyTest] as child
    join cte as parent
        on child.parentid = parent.id
)
select id, parentid, cast([path] as hierarchyid) as [path] 
into h
from cte;

Next, a little table-valued function I wrote:

create function dbo.GetAllAncestors(@h hierarchyid, @ReturnSelf bit)
returns table
as return
   select @h.GetAncestor(n.n) as h
   from dbo.Numbers as n
   where n.n <= @h.GetLevel()
      or (@ReturnSelf = 1 and n.n = 0)

   union all

   select @h
   where @ReturnSelf = 1;

Armed with that, getting your desired result set isn't too bad:

declare @h hierarchyid;

set @h = (
    select path
    from h
    where id = 6
);

with cte as (
    select * 
    from h
    where [path].IsDescendantOf(@h) = 1
        or @h.IsDescendantOf([path]) = 1
)
select h.id as parent, c.id as descendentid
from cte as c
cross apply dbo.GetAllAncestors([path], 1) as a
join h
    on a.h = h.[path]
order by h.id, c.id;

Of course, you're missing out on a lot of the benefit of using a hierarchyid by not persisting it (you'll either have to keep it up to date in the side table or generate it every time). But there you go.

like image 155
Ben Thul Avatar answered Oct 30 '22 22:10

Ben Thul


Okay this has been bothering me since I have read the question and I just came back to think of it again..... Anyway, why do you need to recurse back down to get all of the descendants? You have asked for ancestors not descendants and your result set is not trying to get other siblings, grand children, etc.. It is getting a parent and a grand parent in this case. Your First cte gives you everything you need to know except when an ancestor id is also the parentid. So with a union all, a little magic to setup the originating ancestor, and you have everything you need without a second recursion.

declare @test int = 6;

with cte as (

    -- leaf nodes
    select id, parentid, id as terminalid
    from HiearchyTest
    where isleaf = 1

    union all

    -- walk up - preserve "terminal" item for all levels
    select h.id, h.parentid, c.terminalid
    from HiearchyTest as h
    inner join
    cte as c on h.id = c.parentid

)

, cteAncestors AS (

    SELECT DISTINCT
       id = IIF(parentid IS NULL, @Test, id)
       ,parentid = IIF(parentid IS NULL,id,parentid)
    FROM
       cte
    WHERE
       terminalid = @test

    UNION

    SELECT DISTINCT
       id
       ,parentid = id
    FROM
       cte
    WHERE
       terminalid = @test
) 

SELECT
    id = parentid
    ,DecendentId = id
FROM
    cteAncestors
ORDER BY
    id
    ,DecendentId

Your result set from your first cte gives you your 2 ancestors and self related to their ancestor except in the case of the originating ancestors who's parentid is null. That null is a special case I will deal with in a minute.

enter image description here

Remember at this point your query is producing Ancestors not descendants, but what it doesn't give you is self references meaning grandparent = grandparent, parent = parent, self = self. But all you have to do to get that is to add rows for every id and make the parentid equal to their id. hence the union. Now your result set is almost totally shaped up:

enter image description here

The special case of the null parentid. So the null parentid identifies the originating ancestor meaning that ancestor has no other ancestor in your dataset. And here is how you will use that to your advantage. Because you started your initial recursion at the leaf level there is no direct tie to the id that you started with to the originating ancestor, but there is at every other level, simply hijack that null parent id and flip the values around and you now have an ancestor for your leaf.

enter image description here

Then in the end if you want it to be a descendants table switch the columns and you are finished. One last note DISTINCTs are there in case the id is repeated with an additional parentid. E.g. 6 | 3 and another record for 6 | 4

enter image description here

like image 44
Matt Avatar answered Oct 30 '22 20:10

Matt


I'm not sure if this performs better, or even produces the proper results in all cases, but you could capture a node list, then use xml functionality to parse it out and cross apply to the id list:

declare @test int = 6;

;WITH cte AS (SELECT id, parentid, CAST(id AS VARCHAR(MAX)) as IDlist
              FROM HiearchyTest
              WHERE isleaf = 1
              UNION ALL
              SELECT h.id, h.parentid , CAST(CONCAT(c.IDlist,',',h.id) AS VARCHAR(MAX))
              FROM HiearchyTest as h
              JOIN cte as c 
                ON  h.id = c.parentid
            )
    ,cte2 AS (SELECT *, CAST ('<M>' + REPLACE(IDlist, ',', '</M><M>') + '</M>' AS XML) AS Data 
              FROM cte
              WHERE IDlist LIKE '%'+CAST(@test AS VARCHAR(50))+'%'
              )
SELECT id,Split.a.value('.', 'VARCHAR(100)') AS descendentid
FROM cte2 a
CROSS APPLY Data.nodes ('/M') AS Split(a); 
like image 1
Hart CO Avatar answered Oct 30 '22 22:10

Hart CO