Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently querying a directed/undirected table of graph edges in SQL Server

I've got a SQL server table in which each row represents an edge in a graph network. The FromNodeID and ToNodeID are foreign keys to a node table, and the schema is something like this:

CREATE TABLE #Edges (
  EdgeID int identity (1,1),
  FromNodeID int,
  ToNodeID int
  );

INSERT INTO #Edges (FromNodeID, ToNodeID) VALUES
  (1,2),
  (1,3),
  (1,4),
  (2,3),
  (3,5),
  (4,5),
  (5,6);

Now, if I consider each edge to be directed (i.e., one way), then it's easy to work out all those nodes that I can get to directly from any node. I'd add an index to the FromNodeID column, and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3

Result: 5

But what would be the best way to structure my table/query if I want to treat each edge as unidirectional. i.e. starting from node 3, I'd like to get the results:

Result: 1, 2, 5

The simplest way I can think of would be to add an additional index to the ToNodeID column and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3 
UNION SELECT FromNodeID FROM #Edges WHERE ToNodeID = 3;

But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query? (Note that I don't want to insert the reversed edges again into the table - I need to be able to treat the edges as either directed or undirected at runtime).

Thanks for any advice!

like image 995
Clare Nia Avatar asked Jan 26 '11 21:01

Clare Nia


2 Answers

But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query?

This is efficient enough.

You could do this:

SELECT  CASE 3 WHEN FromNodeId THEN ToNodeId ELSE FromNodeId END
FROM    Edges
WHERE   3 IN (FromNodeId, ToNodeId)

but this will be essentially the same (will UNION these indexes under the hood).

Here's a script to test:

CREATE TABLE #Edges
        (
        EdgeID INT IDENTITY (1,1) PRIMARY KEY,
        FromNodeID int NOT NULL,
        ToNodeID int NOT NULL
        )
CREATE INDEX ix_edges_from ON #Edges (FromNodeID, ToNodeId)
CREATE INDEX ix_edges_to ON #Edges (ToNodeID, FromNodeId)
;
WITH    q (rn) AS
        (
        SELECT  1
        UNION ALL
        SELECT  rn + 1
        FROM    q
        WHERE   rn < 1000
        )
INSERT
INTO    #Edges (FromNodeId, ToNodeId)
SELECT  q1.rn, q2.rn
FROM    q q1
CROSS JOIN
        q q2
WHERE   (q1.rn + q2.rn) % 37 = 0
OPTION (MAXRECURSION 0)

For the UNION:

SELECT  ToNodeId
FROM    #Edges
WHERE   FromNodeId = 3
UNION
SELECT  FromNodeId
FROM    #Edges
WHERE   ToNodeId = 3


  |--Stream Aggregate(GROUP BY:([Union1006]))
       |--Merge Join(Concatenation)
            |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
            |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)

For the IN:

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN (3)=[tempdb].[dbo].[#Edges].[FromNodeID] THEN [tempdb].[dbo].[#Edges].[ToNodeID] ELSE [tempdb].[dbo].[#Edges].[FromNodeID] END))
       |--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#Edges].[EdgeID] ASC))
            |--Concatenation
                 |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
                 |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)

As you can see, the plans are essentially the same: they both take values from the corresponding indexes and concatenate the results.

The UNION query is in fact a little more efficient, since it uses a Merge Join to concatenate the results, and the records come out of the merge join naturally ordered, so the Stream Aggregate does not need to sort.

like image 149
Quassnoi Avatar answered Oct 04 '22 23:10

Quassnoi


Must you process the graph directly from SQL Server? If you are really concerned about performance, you should use one of the data-structures specifically for representing and processing graphs. Most of the work I have done with graphs (and I have done plenty) would have been infeasible if I used a generic database backend to consult the graphs.

One of the most effective representations that I have used is described in the appendices of a compiler book I have: Engineering a Compiler, by Keith Cooper and Linda Torczon.

like image 33
Kevin A. Naudé Avatar answered Oct 04 '22 23:10

Kevin A. Naudé