Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve hierarchical groups ... with infinite recursion

I've a table like this which contains links :

key_a    key_b
--------------
a        b        
b        c
g        h     
a        g       
c        a
f        g

not really tidy & infinite recursion ...

key_a = parent key_b = child

Require a query which will recompose and attribute a number for each hierarchical group (parent + direct children + indirect children) :

key_a    key_b    nb_group
--------------------------
a        b        1
a        g        1
b        c        1
**c        a**        1
f        g        2
g        h        2

**link responsible of infinite loop**

Because we have

A-B-C-A

-> Only want to show simply the link as shown.

Any idea ?

Thanks in advance

like image 936
Visée Maxence Avatar asked Jul 30 '13 13:07

Visée Maxence


2 Answers

The problem is that you aren't really dealing with strict hierarchies; you're dealing with directed graphs, where some graphs have cycles. Notice that your nbgroup #1 doesn't have any canonical root-- it could be a, b, or c due to the cyclic reference from c-a.

The basic way of dealing with this is to think in terms of graph techniques, not recursion. In fact, an iterative approach (not using a CTE) is the only solution I can think of in SQL. The basic approach is explained here.

Here is a SQL Fiddle with a solution that addresses both the cycles and the shared-leaf case. Notice it uses iteration (with a failsafe to prevent runaway processes) and table variables to operate; I don't think there's any getting around this. Note also the changed sample data (a-g changed to a-h; explained below).

If you dig into the SQL you'll notice that I changed some key things from the solution given in the link. That solution was dealing with undirected edges, whereas your edges are directed (if you used undirected edges the entire sample set is a single component because of the a-g connection).

This gets to the heart of why I changed a-g to a-h in my sample data. Your specification of the problem is straightforward if only leaf nodes are shared; that's the specification I coded to. In this case, a-h and g-h can both get bundled off to their proper components with no problem, because we're concerned about reachability from parents (even given cycles).

However, when you have shared branches, it's not clear what you want to show. Consider the a-g link: given this, g-h could exist in either component (a-g-h or f-g-h). You put it in the second, but it could have been in the first instead, right? This ambiguity is why I didn't try to address it in this solution.

Edit: To be clear, in my solution above, if shared braches ARE encountered, it treats the whole set as a single component. Not what you described above, but it will have to be changed after the problem is clarified. Hopefully this gets you close.

like image 50
Dominic P Avatar answered Nov 19 '22 04:11

Dominic P


You should use a recursive query. In the first part we select all records which are top level nodes (have no parents) and using ROW_NUMBER() assign them group ID numbers. Then in the recursive part we add to them children one by one and use parent's groups Id numbers.

with CTE as 
(

select t1.parent,t1.child,
       ROW_NUMBER() over (order by t1.parent) rn

from t t1 where 
not exists (select 1 from t where child=t1.parent)
union all
select t.parent,t.child, CTE.rn
from t  
join CTE on t.parent=CTE.Child  
)
select * from CTE
order by RN,parent

SQLFiddle demo

like image 2
valex Avatar answered Nov 19 '22 03:11

valex