Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph problems: connect by NOCYCLE prior replacement in SQL server?

question:

I have the following (directed) graph: Graph

And this table:

CREATE TABLE [dbo].[T_Hops](
    [UID] [uniqueidentifier] NULL,
    [From] [nvarchar](1000) NULL,
    [To] [nvarchar](1000) NULL,
    [Distance] [decimal](18, 5) NULL
) ON [PRIMARY]

GO

And this content:

      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'E'              ,10.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'E'              ,'D'              ,20.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'B'              ,5.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'B'              ,'C'              ,10.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'C'              ,'D'              ,5.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'A'              ,'F'              ,2.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'F'              ,'G'              ,6.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'G'              ,'H'              ,3.00000              );   
      INSERT INTO [dbo].[T_Hops]             ([UID]             ,[From]             ,[To]             ,[Distance])       VALUES             (newid()              ,'H'              ,'D'              ,1.00000              );   

Now I can query the best connection from point x to point y like this:

WITH AllRoutes 
(
     [UID]
    ,[FROM]
    ,[To]
    ,[Distance]
    ,[Path]
    ,[Hops]
)
AS
(
    SELECT 
         [UID]
        ,[FROM]
        ,[To]
        ,[Distance]
        ,CAST(([dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar(MAX)) AS [Path]
        ,1 AS [Hops]
      FROM [dbo].[T_Hops]
      WHERE [FROM] = 'A'

    UNION ALL


    SELECT 
         [dbo].[T_Hops].[UID]
        --,[dbo].[T_Hops].[FROM]
        ,Parent.[FROM]
        ,[dbo].[T_Hops].[To]
        ,CAST((Parent.[Distance] + [dbo].[T_Hops].[Distance]) AS [decimal](18, 5)) AS distance
        ,CAST((Parent.[Path] + '/' + [dbo].[T_Hops].[FROM] + [dbo].[T_Hops].[To]) AS varchar(MAX)) AS [Path]
        ,(Parent.[Hops] + 1) AS [Hops]
     FROM [dbo].[T_Hops]

    INNER JOIN AllRoutes AS Parent 
            ON Parent.[To] = [dbo].[T_Hops].[FROM] 

)

SELECT TOP 100 PERCENT * FROM AllRoutes


/*
WHERE [FROM] = 'A' 
AND [To] = 'D'
AND CHARINDEX('F', [Path]) != 0 -- via F
ORDER BY Hops, Distance ASC
*/

GO

Now I want to create an undirected graph, for that I can, for example, also get the path from D to A

I start with a most simple change, and just ad the reverse direction for HD.

INSERT INTO [dbo].[T_Hops]
           ([UID]
           ,[From]
           ,[To]
           ,[Distance])
     VALUES
           (newid() --<UID, uniqueidentifier,>
           ,'D' --<From, nvarchar(1000),>
           ,'H' --<To, nvarchar(1000),>
           ,1 --<Distance, decimal(18,5),>
           )
GO

Now, as expected, my query throws an exception:

Infinite recursion / max recursion level (100) exceeded

Because the number of possible connections is now infinite.

Now in Oracle you do the same thing with "connect by prior" instead of a tree. And if a cycle problem (infinite recursion) is possible, you just add NOCYCLE to CONNECT BY PRIOR, making it "CONNECT BY NOCYCLE PRIOR"

Now in MS-SQL, I fixed that behaviour by adding:

AND Parent.[Path] NOT LIKE '%' + [dbo].[T_Hops].[FROM] + '/%'

to the inner join clause, essentially emulating NOCYCLE.

However, as LIKE is basically strstr (or worse strcasestr), and thus maximally slower than checking an array of parent elements, I am extremely worried about performance.

After all, this is just an example, and I intend to basically add data for an entire country. So the end result would potentially be extremely slow.

Anybody else has a better (=faster) method of how to replace NOCYCLE in MS SQL ?

Or is this the point where I simply have no other option but switching to Oracle (for doing this in an acceptable speed) ?

Note: Any temp tables (large amount of data) solution will be slower, because the temp tables will be swapped to the HardDisk when there is not enough RAM (absolute certainty).

Same goes for any solution using functions and table-valued functions.

like image 695
Stefan Steiger Avatar asked Aug 18 '11 10:08

Stefan Steiger


1 Answers

To improve select performance store possible paths between nodes in a permanent table

TABLE T_Hops_Path
(
  FromNode,
  ToNode,
  HopCount,
  TotalDistance
)

If your tree structure does not change often you can write a stored procedure that generates this table every N hours.

like image 196
alexm Avatar answered Sep 21 '22 23:09

alexm