Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify groups/clusters in set of arcs/edges in SQL?

Tags:

sql

teradata

I have arcs/edges like these ‘haves’:

Node1   Node2
A       B
B       C
D       E

Here A is connected to B and B to C. D is connected to E. In other words there are 2 groups/clusters shown in these ‘wants’:

Node1   Node2   Cluster
A       B       1
B       C       1
D       E       2

Could I use SQL to identify these groups/clusters? I guess this involves self-joins but I cannot see how to write this SQL. Any feedback would be very much appreciated. Thanks!

like image 448
cs0815 Avatar asked Jul 20 '17 11:07

cs0815


2 Answers

Assuming from the sample data provided in question, that there is no loops like A->B->C->A in data, Following is the query that will return the desired output from nodes table.

WITH RECURSIVE NodeCluster (node1,node2,Cluster1) AS
      (SELECT node1,
              node2,
              Rank() Over (
                           ORDER BY node1)
       FROM nodes AS n1
       WHERE NOT EXISTS
           (SELECT *
            FROM nodes AS n2
            WHERE n1.node1 = n2.node2)
       UNION ALL SELECT N1.node1,
                        N1.node2,
                        NodeCluster.Cluster1
       FROM nodes n1,
            NodeCluster
       WHERE NodeCluster.node2=n1.node1 )
    SELECT *
    FROM NodeCluster
    ORDER BY Cluster1,
             node1,
             node2;

In seed query all the starting nodes are selected and ranked in asc to assign Cluster number to data in ascending order.

On the data provided in question, following is the output.

Node1 | Node2  | Cluster1
-------------------------
A       B        1
B       C        1
D       E        2

For re-assurance, more data has been added to the sample data as below.

Node1 | Node2
-------------
A        B   
B        C   
D        E   
E        F   
F        G   
H        I   
I        J   
J        K   
L        M  

The query resulted in below output.

Node1 | Node2  | Cluster1
-------------------------
A       B         1
B       C         1
D       E         2
E       F         2
F       G         2
H       I         3
I       J         3
J       K         3
L       M         4

The solution query is successfully tested both with Teradata SQL Assistant and bteq in teradata and ANSI modes.

Hope this will help.

like image 107
zarruq Avatar answered Oct 18 '22 00:10

zarruq


Try this:

    DECLARE @Edges TABLE
    (
      ID INT ,
      Node1 CHAR(1) ,
      Node2 CHAR(1)
    );
INSERT  INTO @Edges
        ( Node1, Node2 )
VALUES  ( 'A', 'B' ),
        ( 'B', 'C' ),
        ( 'D', 'E' );
WITH    CTE_Nodes
          AS ( SELECT   Node1 AS Node
               FROM     @Edges
               UNION
               SELECT   Node2 AS Node
               FROM     @Edges
             ),
        CTE_Pairs
          AS ( SELECT   Node1 ,
                        Node2
               FROM     @Edges
               WHERE    Node1 <> Node2
               UNION
               SELECT   Node2 AS Node1 ,
                        Node1 AS Node2
               FROM     @Edges
               WHERE    Node1 <> Node2
             ),
        CTE_Recursive
          AS ( SELECT   CAST(CTE_Nodes.Node AS VARCHAR(8000)) AS AnchorNode ,
                        Node1 ,
                        Node2 ,
                        CAST(',' + Node1 + ',' + Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
                        1 AS Lvl
               FROM     CTE_Pairs
                        INNER JOIN CTE_Nodes ON CTE_Nodes.Node = CTE_Pairs.Node1
               UNION ALL
               SELECT   CTE_Recursive.AnchorNode ,
                        CTE_Pairs.Node1 ,
                        CTE_Pairs.Node2 ,
                        CAST(CTE_Recursive.NodePath + CTE_Pairs.Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
                        CTE_Recursive.Lvl + 1 AS Lvl
               FROM     CTE_Pairs
                        INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = CTE_Pairs.Node1
               WHERE    CTE_Recursive.NodePath NOT LIKE CAST('%,'
                        + CTE_Pairs.Node2 + ',%' AS VARCHAR(8000))
             ),
        CTE_RecursionResult
          AS ( SELECT   AnchorNode ,
                        Node1 ,
                        Node2
               FROM     CTE_Recursive
             ),
        CTE_CleanResult
          AS ( SELECT   AnchorNode ,
                        Node1 AS Node
               FROM     CTE_RecursionResult
               UNION
               SELECT   AnchorNode ,
                        Node2 AS Node
               FROM     CTE_RecursionResult
             )
    SELECT  Edges.Node1 ,
            Edges.Node2 ,
            DENSE_RANK() OVER ( ORDER BY CASE WHEN CA_Data.XML_Value IS NULL
                                              THEN Edges.Node1
                                              ELSE CA_Data.XML_Value
                                         END ) AS Cluster
    FROM    @Edges Edges
            CROSS APPLY ( SELECT    CTE_CleanResult.Node + ','
                          FROM      CTE_CleanResult
                          WHERE     CTE_CleanResult.AnchorNode = Edges.Node1
                          ORDER BY  CTE_CleanResult.Node
                        FOR
                          XML PATH('') ,
                              TYPE
                        ) AS CA_XML ( XML_Value )
            CROSS APPLY ( SELECT    CA_XML.XML_Value.value('.',
                                                           'NVARCHAR(MAX)')
                        ) AS CA_Data ( XML_Value );

SQL Fiddle DEMO

like image 45
Ali Adlavaran Avatar answered Oct 18 '22 00:10

Ali Adlavaran