Efficiently querying a directed/undirected table of graph edges in SQL Server

Question

I've got a SQL server table in which each row represents an edge in a graph network. The FromNodeID and ToNodeID are foreign keys to a node table, and the schema is something like this:

CREATE TABLE #Edges (
  EdgeID int identity (1,1),
  FromNodeID int,
  ToNodeID int
  );

INSERT INTO #Edges (FromNodeID, ToNodeID) VALUES
  (1,2),
  (1,3),
  (1,4),
  (2,3),
  (3,5),
  (4,5),
  (5,6);

Now, if I consider each edge to be directed (i.e., one way), then it's easy to work out all those nodes that I can get to directly from any node. I'd add an index to the FromNodeID column, and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3

Result: 5

But what would be the best way to structure my table/query if I want to treat each edge as unidirectional. i.e. starting from node 3, I'd like to get the results:

Result: 1, 2, 5

The simplest way I can think of would be to add an additional index to the ToNodeID column and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3 
UNION SELECT FromNodeID FROM #Edges WHERE ToNodeID = 3;

But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query? (Note that I don't want to insert the reversed edges again into the table - I need to be able to treat the edges as either directed or undirected at runtime).

Thanks for any advice!

Quassnoi · Accepted Answer

But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query?

This is efficient enough.

You could do this:

SELECT  CASE 3 WHEN FromNodeId THEN ToNodeId ELSE FromNodeId END
FROM    Edges
WHERE   3 IN (FromNodeId, ToNodeId)

but this will be essentially the same (will UNION these indexes under the hood).

Here's a script to test:

CREATE TABLE #Edges
        (
        EdgeID INT IDENTITY (1,1) PRIMARY KEY,
        FromNodeID int NOT NULL,
        ToNodeID int NOT NULL
        )
CREATE INDEX ix_edges_from ON #Edges (FromNodeID, ToNodeId)
CREATE INDEX ix_edges_to ON #Edges (ToNodeID, FromNodeId)
;
WITH    q (rn) AS
        (
        SELECT  1
        UNION ALL
        SELECT  rn + 1
        FROM    q
        WHERE   rn < 1000
        )
INSERT
INTO    #Edges (FromNodeId, ToNodeId)
SELECT  q1.rn, q2.rn
FROM    q q1
CROSS JOIN
        q q2
WHERE   (q1.rn + q2.rn) % 37 = 0
OPTION (MAXRECURSION 0)

For the UNION:

SELECT  ToNodeId
FROM    #Edges
WHERE   FromNodeId = 3
UNION
SELECT  FromNodeId
FROM    #Edges
WHERE   ToNodeId = 3


  |--Stream Aggregate(GROUP BY:([Union1006]))
       |--Merge Join(Concatenation)
            |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
            |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)

For the IN:

  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN (3)=[tempdb].[dbo].[#Edges].[FromNodeID] THEN [tempdb].[dbo].[#Edges].[ToNodeID] ELSE [tempdb].[dbo].[#Edges].[FromNodeID] END))
       |--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#Edges].[EdgeID] ASC))
            |--Concatenation
                 |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
                 |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)

As you can see, the plans are essentially the same: they both take values from the corresponding indexes and concatenate the results.

The UNION query is in fact a little more efficient, since it uses a Merge Join to concatenate the results, and the records come out of the merge join naturally ordered, so the Stream Aggregate does not need to sort.

Kevin A. Naudé · Answer

Must you process the graph directly from SQL Server? If you are really concerned about performance, you should use one of the data-structures specifically for representing and processing graphs. Most of the work I have done with graphs (and I have done plenty) would have been infeasible if I used a generic database backend to consult the graphs.

One of the most effective representations that I have used is described in the appendices of a compiler book I have: Engineering a Compiler, by Keith Cooper and Linda Torczon.

Efficiently querying a directed/undirected table of graph edges in SQL Server

Tags:

sql

sql-server

sql-server-2008

graph-theory

sql-graph

Clare Nia

2 Answers

Quassnoi

Kevin A. Naudé

Recent Activity

Donate For Us

Efficiently querying a directed/undirected table of graph edges in SQL Server

Tags:

sql

sql-server

sql-server-2008

graph-theory

sql-graph

Clare Nia

2 Answers

Quassnoi

Kevin A. Naudé

Related questions

Recent Activity

Donate For Us