Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Directed graph in Oracle SQL using recursive query visiting each node only once

Description

In our problem domain we are working on a set of edges that connect together forming a graph. Starting from a given node (or nodes), we have to list all links within the entire graph that are connected to the given node (or nodes). We have to show these links from left-to-right, top-to-bottom.

We have a working query for this problem for graphs with a limited number of loops. Higher number of loops increases the execution time exponentially.

We need to limit visits to the same node during recursion to have a working solution.

The example below contains only a single loop, but this single loop is already causing 86 additional and obsolete rows.

In similar posts solutions are provided for postgresql using ROW and ANY operators, but I could not find an Oracle solution.

We are searching for either an alternative for our solution or a way to limit the number of visits to the same edges.

Any help is greatly appreciated!

Similar

Visiting a directed graph as if it were an undirected one, using a recursive query provides a solution in postgresql. We are required to use Oracle11g.

Example

Edges

A-B, B-D, C-A, C-E, C-F, H-F, E-B, G-D, G-I

Graphical

    A
  /   \
C - E - B - D
  \       /
H - F   G - I

DDL and DML

CREATE TABLE EDGE (
  FROM_ID VARCHAR(10),
  TO_ID   VARCHAR(10)
);

INSERT INTO EDGE VALUES ('A', 'B');
INSERT INTO EDGE VALUES ('E', 'B');
INSERT INTO EDGE VALUES ('C', 'E');
INSERT INTO EDGE VALUES ('C', 'A');
INSERT INTO EDGE VALUES ('C', 'F');
INSERT INTO EDGE VALUES ('B', 'D');
INSERT INTO EDGE VALUES ('G', 'D');
INSERT INTO EDGE VALUES ('H', 'F');
INSERT INTO EDGE VALUES ('G', 'I');

Input

nodes: 'A'

Required Output

C   A
C   E
C   F
H   F
A   B
E   B
B   D
G   D
G   I

Current Solution

Our current solution returns exactly what we need, but as mentioned above each additional loop increases the execution time exponentially.

SELECT
  c.LVL,
  c.FROM_ID,
  c.TO_ID,
  CASE
  WHEN lag(C.TO_ID)
       OVER (
         PARTITION BY C.LVL
         ORDER BY C.LVL, C.TO_ID ) = C.TO_ID
    THEN C.LVL || '-' || C.TO_ID
  WHEN lead(C.TO_ID)
       OVER (
         PARTITION BY C.LVL
         ORDER BY C.LVL, C.TO_ID ) = C.TO_ID
    THEN C.LVL || '-' || C.TO_ID
  ELSE C.LVL || '-' || C.FROM_ID
  END GROUP_ID
FROM (
       WITH chain(LVL, FROM_ID, TO_ID ) AS (
         SELECT
           1            LVL,
           root.FROM_ID FROM_ID,
           root.TO_ID   TO_ID
         FROM EDGE root
         WHERE root.TO_ID IN (:nodes)
               OR (root.FROM_ID IN (:nodes) AND NOT EXISTS(
             SELECT *
             FROM EDGE
             WHERE TO_ID IN (:nodes)
         ))
         UNION ALL
         SELECT
           LVL +
           CASE
           WHEN previous.TO_ID = the_next.FROM_ID
             THEN 1
           WHEN previous.TO_ID = the_next.TO_ID
             THEN 0
           WHEN previous.FROM_ID = the_next.FROM_ID
             THEN 0
           ELSE -1
           END              LVL,
           the_next.FROM_ID FROM_ID,
           the_next.TO_ID   TO_ID
         FROM EDGE the_next
           JOIN chain previous ON previous.TO_ID = the_next.FROM_ID
                                  OR the_next.TO_ID = previous.FROM_ID
                                  OR (previous.TO_ID = the_next.TO_ID AND previous.FROM_ID <> the_next.FROM_ID)
                                  OR (previous.TO_ID <> the_next.TO_ID AND previous.FROM_ID = the_next.FROM_ID)
       )
         SEARCH BREADTH FIRST BY FROM_ID SET ORDER_ID
         CYCLE FROM_ID, TO_ID SET CYCLE TO 1 DEFAULT 0
       SELECT
         C.*,
         row_number()
         OVER (
           PARTITION BY LVL, FROM_ID, TO_ID
           ORDER BY ORDER_ID ) rank
       FROM chain C
       ORDER BY LVL, FROM_ID, TO_ID
     ) C
WHERE C.rank = 1;
like image 910
Bart Gerard Avatar asked Jun 28 '17 09:06

Bart Gerard


1 Answers

For keeping the traversing algorithm out of returning to already visited edges, one can indeed keep the visited edges somewhere. As you already found out, you won't get much of a success with a string concatenation. However, there are other usable "value concatenation" techniques available...

You have to have one handy schema-level collection of scalars at your disposal:

create or replace type arr_strings is table of varchar2(64);

And then you can collect the visited edges to that collection in each iteration:

with nondirected$ as (
    select from_id, to_id, from_id||'-'||to_id as edge_desc
    from edge
    where from_id != to_id
    union all
    select to_id, from_id, from_id||'-'||to_id as edge_desc
    from edge
    where (to_id, from_id) not in (
            select from_id, to_id
            from edge
        )
),
graph$(lvl, from_id, to_id, edge_desc, visited_edges) as (
    select 1, from_id, to_id, edge_desc,
        arr_strings(edge_desc)
    from nondirected$ R
    where from_id in (&nodes)
    --
    union all
    --
    select
        lvl+1,
        Y.from_id, Y.to_id, Y.edge_desc,
        X.visited_edges multiset union arr_strings(Y.edge_desc)
    from graph$ X
        join nondirected$ Y
            on Y.from_id = X.to_id
    where not exists (
            select 1
            from table(X.visited_edges) Z
            where Y.edge_desc = Z.column_value
        )
)
search breadth first by edge_desc set order_id
    cycle edge_desc set is_cycle to 1 default 0,
ranked_graph$ as (
    select C.*,
        row_number() over (partition by edge_desc order by lvl, order_id) as rank$
    from graph$ C
--    where is_cycle = 0
)
select *
from ranked_graph$
--where rank$ <= 1
order by lvl, order_id
;

Notes

  1. I pre-processed the directed graph to a nondirected one by union-ing a set of reverse edges to the input. That should make the recursive traversal predicates easier to read. Solely for my purposes of easier reading+writing of the SQL. You don't have to do that, of course.
  2. I remember trying something like this a few years ago on Oracle 11.2. And I remember that it was failing, though I don't remember why. On 12.2, it ran OK. Just give it a try on 11g, too; I don't have one available.
  3. Since each iteration does, in addition to the traversal inner join, also an anti-join, I sincerely doubt that this is going to be more performant. However, it for sure solves the problem of lowering the number of recursive nestings.
  4. You'll have to resolve the desired ordering on your own, as you probably understood from my comments. :-)

Limiting the revisited edges to zero

In SQL, you can't. The PostgreSQL solution you mentioned does do it. In Oracle, however, you can't. You would have to, for each traversal join, test rows for all other traversal joins. And that would mean some kind of aggregation or analytics... which Oracle forbids and throws out an ORA exception.

PLSQL to the rescue?

You can do it in PL/SQL, though. How much performant it is supposed to be, depends on how much memory you want to spend on prefetching edges from DB vs. how many SQL roundtrips you are willing to take to traverse the graph from "current" nodes or if you are willing to use even more memory to keep the visited nodes in a fancy indexed-by-edge collection vs. if you rather anti-join against a regular arr_output collection l_visited_nodes. You have multiple choices, choose wisely.

Anyway, for the simplest scenario with heavier use of SQL engine, this might be the code you're looking for...

create or replace
package pkg_so_recursive_traversal
is


type rec_output                     is record (
    from_id                             edge.from_id%type,
    to_id                               edge.to_id%type,
    lvl                                 integer
);
type arr_output                     is table of rec_output;


function traverse_a_graph
    ( i_from                        in arr_strings
    , i_is_directed                 in varchar2 default 'NO' )
    return arr_output
    pipelined;


end pkg_so_recursive_traversal;
/
create or replace
package body pkg_so_recursive_traversal
is


function traverse_a_graph
    ( i_from                        in arr_strings
    , i_is_directed                 in varchar2 )
    return arr_output
    pipelined
is
    l_next_edges                    arr_output;
    l_current_edges                 arr_output;
    l_visited_edges                 arr_output := arr_output();
    l_out                           rec_output;
    i                               pls_integer;
    l_is_directed                   varchar2(32) := case when i_is_directed = 'YES' then 'YES' else 'NO' end;
begin
    select E.from_id, E.to_id, 0
    bulk collect into l_next_edges
    from table(i_from) F
        join edge E
            on F.column_value in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
    where E.from_id != E.to_id;

    l_out.lvl := 0;

    loop
        dbms_output.put_line(l_next_edges.count());
        exit when l_next_edges.count() <= 0;
        l_out.lvl := l_out.lvl + 1;

        -- spool the edges to output
        i := l_next_edges.first();
        while i is not null loop
            l_out.from_id := l_next_edges(i).from_id;
            l_out.to_id := l_next_edges(i).to_id;
            pipe row(l_out);
            i := l_next_edges.next(i);
        end loop;

        l_current_edges := l_next_edges;
        l_visited_edges := l_visited_edges multiset union l_current_edges;

        -- find next edges
        select unique E.from_id, E.to_id, 0
        bulk collect into l_next_edges
        from table(l_current_edges) CE
            join edge E
                on CE.to_id in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
                or l_is_directed = 'NO' and CE.from_id in (E.from_id, E.to_id)
        where E.from_id != E.to_id
            and not exists (
                select 1
                from table(l_visited_edges) VE
                where VE.from_id = E.from_id
                    and VE.to_id = E.to_id
            );
    end loop;

    return;
end;


end pkg_so_recursive_traversal;

/

When called for the starting node of A and considering the graph to be undirected...

select *
from table(pkg_so_recursive_traversal.traverse_a_graph(
        i_from => arr_strings('A'),
        i_is_directed => 'NO'
    ));

... it yields...

FROM_ID    TO_ID             LVL
---------- ---------- ----------
A          B                   1
C          A                   1
C          E                   2
B          D                   2
C          F                   2
E          B                   2
G          D                   3
H          F                   3
G          I                   4

Notes

  1. Again, I did not put any effort to keep the ordering you requested, as you said it's not that important.
  2. This is doing multiple (exactly 5 for your example inputs) SQL roundtrips to the edge table. That may or may not be a bigger performance hit when compared to the pure-SQL solution with redundant edge visiting. Test more solutions properly, see which one works for you the best.
  3. This particular piece of code will work on 12c and higher. For 11g and lower you'll have to declare the rec_output and arr_output types on the schema level.
like image 101
peter.hrasko.sk Avatar answered Nov 12 '22 10:11

peter.hrasko.sk