I have arcs/edges like these ‘haves’:
Node1 Node2
A B
B C
D E
Here A is connected to B and B to C. D is connected to E. In other words there are 2 groups/clusters shown in these ‘wants’:
Node1 Node2 Cluster
A B 1
B C 1
D E 2
Could I use SQL to identify these groups/clusters? I guess this involves self-joins but I cannot see how to write this SQL. Any feedback would be very much appreciated. Thanks!
Assuming from the sample data provided in question, that there is no loops like A->B->C->A
in data, Following is the query that will return the desired output from nodes
table.
WITH RECURSIVE NodeCluster (node1,node2,Cluster1) AS
(SELECT node1,
node2,
Rank() Over (
ORDER BY node1)
FROM nodes AS n1
WHERE NOT EXISTS
(SELECT *
FROM nodes AS n2
WHERE n1.node1 = n2.node2)
UNION ALL SELECT N1.node1,
N1.node2,
NodeCluster.Cluster1
FROM nodes n1,
NodeCluster
WHERE NodeCluster.node2=n1.node1 )
SELECT *
FROM NodeCluster
ORDER BY Cluster1,
node1,
node2;
In seed query all the starting nodes are selected and ranked
in asc
to assign Cluster number to data in ascending order.
On the data provided in question, following is the output.
Node1 | Node2 | Cluster1
-------------------------
A B 1
B C 1
D E 2
For re-assurance, more data has been added to the sample data as below.
Node1 | Node2
-------------
A B
B C
D E
E F
F G
H I
I J
J K
L M
The query resulted in below output.
Node1 | Node2 | Cluster1
-------------------------
A B 1
B C 1
D E 2
E F 2
F G 2
H I 3
I J 3
J K 3
L M 4
The solution query is successfully tested both with Teradata SQL Assistant
and bteq
in teradata
and ANSI
modes.
Hope this will help.
Try this:
DECLARE @Edges TABLE
(
ID INT ,
Node1 CHAR(1) ,
Node2 CHAR(1)
);
INSERT INTO @Edges
( Node1, Node2 )
VALUES ( 'A', 'B' ),
( 'B', 'C' ),
( 'D', 'E' );
WITH CTE_Nodes
AS ( SELECT Node1 AS Node
FROM @Edges
UNION
SELECT Node2 AS Node
FROM @Edges
),
CTE_Pairs
AS ( SELECT Node1 ,
Node2
FROM @Edges
WHERE Node1 <> Node2
UNION
SELECT Node2 AS Node1 ,
Node1 AS Node2
FROM @Edges
WHERE Node1 <> Node2
),
CTE_Recursive
AS ( SELECT CAST(CTE_Nodes.Node AS VARCHAR(8000)) AS AnchorNode ,
Node1 ,
Node2 ,
CAST(',' + Node1 + ',' + Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
1 AS Lvl
FROM CTE_Pairs
INNER JOIN CTE_Nodes ON CTE_Nodes.Node = CTE_Pairs.Node1
UNION ALL
SELECT CTE_Recursive.AnchorNode ,
CTE_Pairs.Node1 ,
CTE_Pairs.Node2 ,
CAST(CTE_Recursive.NodePath + CTE_Pairs.Node2 + ',' AS VARCHAR(8000)) AS NodePath ,
CTE_Recursive.Lvl + 1 AS Lvl
FROM CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = CTE_Pairs.Node1
WHERE CTE_Recursive.NodePath NOT LIKE CAST('%,'
+ CTE_Pairs.Node2 + ',%' AS VARCHAR(8000))
),
CTE_RecursionResult
AS ( SELECT AnchorNode ,
Node1 ,
Node2
FROM CTE_Recursive
),
CTE_CleanResult
AS ( SELECT AnchorNode ,
Node1 AS Node
FROM CTE_RecursionResult
UNION
SELECT AnchorNode ,
Node2 AS Node
FROM CTE_RecursionResult
)
SELECT Edges.Node1 ,
Edges.Node2 ,
DENSE_RANK() OVER ( ORDER BY CASE WHEN CA_Data.XML_Value IS NULL
THEN Edges.Node1
ELSE CA_Data.XML_Value
END ) AS Cluster
FROM @Edges Edges
CROSS APPLY ( SELECT CTE_CleanResult.Node + ','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorNode = Edges.Node1
ORDER BY CTE_CleanResult.Node
FOR
XML PATH('') ,
TYPE
) AS CA_XML ( XML_Value )
CROSS APPLY ( SELECT CA_XML.XML_Value.value('.',
'NVARCHAR(MAX)')
) AS CA_Data ( XML_Value );
SQL Fiddle DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With