Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to traverse a hierarchical tree-structure structure backwards using recursive queries

I'm using PostgreSQL 9.1 to query hierarchical tree-structured data, consisting of edges (or elements) with connections to nodes. The data are actually for stream networks, but I've abstracted the problem to simple data types. Consider the example tree table. Each edge has length and area attributes, which are used to determine some useful metrics from the network.

CREATE TEMP TABLE tree (
  edge text PRIMARY KEY,
  from_node integer UNIQUE NOT NULL, -- can also act as PK
  to_node integer REFERENCES tree (from_node),
  mode character varying(5), -- redundant, but illustrative
  length numeric NOT NULL,
  area numeric NOT NULL,
  fwd_path text[], -- optional ordered sequence, useful for debugging
  fwd_search_depth integer,
  fwd_length numeric,
  rev_path text[], -- optional unordered set, useful for debugging
  rev_search_depth integer,
  rev_length numeric,
  rev_area numeric
);
CREATE INDEX ON tree (to_node);
INSERT INTO tree(edge, from_node, to_node, mode, length, area) VALUES
 ('A', 1, 4, 'start', 1.1, 0.9),
 ('B', 2, 4, 'start', 1.2, 1.3),
 ('C', 3, 5, 'start', 1.8, 2.4),
 ('D', 4, 5, NULL, 1.2, 1.3),
 ('E', 5, NULL, 'end', 1.1, 0.9);

Which can be illustrated below, where the edges represented by A-E are connected with nodes 1-5. The NULL to_node (Ø) represents the end node. The from_node is always unique, so it can act as PK. If this network flows like a drainage basin, it would be from top to bottom, where the starting tributary edges are A, B, C and the ending outflow edge is E.

tree

The documentation for WITH provide a nice example of how to use search graphs in recursive queries. So, to get the "forwards" information, the query starts at the end, and works backwards:

WITH RECURSIVE search_graph AS (
  -- Begin at ending nodes
  SELECT E.from_node, 1 AS search_depth, E.length
    , ARRAY[E.edge] AS path -- optional
  FROM tree E WHERE E.to_node IS NULL

  UNION ALL
  -- Accumulate each edge, working backwards (upstream)
  SELECT o.from_node, sg.search_depth + 1, sg.length + o.length
    , o.edge|| sg.path -- optional
  FROM tree o, search_graph sg
  WHERE o.to_node = sg.from_node
)
UPDATE tree SET
  fwd_path = sg.path,
  fwd_search_depth = sg.search_depth,
  fwd_length = sg.length
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, fwd_path, fwd_search_depth, fwd_length
FROM tree ORDER BY edge;

 edge | from_node | to_node | fwd_path | fwd_search_depth | fwd_length
------+-----------+---------+----------+------------------+------------
 A    |         1 |       4 | {A,D,E}  |                3 |        3.4
 B    |         2 |       4 | {B,D,E}  |                3 |        3.5
 C    |         3 |       5 | {C,E}    |                2 |        2.9
 D    |         4 |       5 | {D,E}    |                2 |        2.3
 E    |         5 |         | {E}      |                1 |        1.1

The above makes sense, and scales well for large networks. For example, I can see edge B is 3 edges from the end, and the forward path is {B,D,E} with a total length of 3.5 from the tip to the end.

However, I cannot figure out a good way to build a reverse query. That is, from each edge, what are the accumulated "upstream" edges, lengths and areas. Using WITH RECURSIVE, the best I have is:

WITH RECURSIVE search_graph AS (
  -- Begin at starting nodes
  SELECT S.from_node, S.to_node, 1 AS search_depth, S.length, S.area
    , ARRAY[S.edge] AS path -- optional
  FROM tree S WHERE from_node IN (
    -- Starting nodes have a from_node without any to_node
    SELECT from_node FROM tree EXCEPT SELECT to_node FROM tree)

  UNION ALL
  -- Accumulate edges, working forwards
  SELECT c.from_node, c.to_node, sg.search_depth + 1, sg.length + c.length, sg.area + c.area
         , c.edge || sg.path -- optional
  FROM tree c, search_graph sg
  WHERE c.from_node = sg.to_node
)
UPDATE tree SET
  rev_path = sg.path,
  rev_search_depth = sg.search_depth,
  rev_length = sg.length,
  rev_area = sg.area
FROM search_graph AS sg WHERE sg.from_node = tree.from_node;

SELECT edge, from_node, to_node, rev_path, rev_search_depth, rev_length, rev_area
FROM tree ORDER BY edge;

 edge | from_node | to_node | rev_path | rev_search_depth | rev_length | rev_area
------+-----------+---------+----------+------------------+------------+----------
 A    |         1 |       4 | {A}      |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}      |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}      |                1 |        1.8 |      2.4
 D    |         4 |       5 | {D,A}    |                2 |        2.3 |      2.2
 E    |         5 |         | {E,C}    |                2 |        2.9 |      3.3

I would like to build aggregates into the second term of the recursive query, since each downstream edge connects to 1 or many upstream edges, but aggregates are not allowed with recursive queries. Also, I'm aware that the join is sloppy, since the with recursive result has multiple join conditions for edge.

The expected result for the reverse / backwards query is:

 edge | from_node | to_node |  rev_path   | rev_search_depth | rev_length | rev_area
------+-----------+---------+-------------+------------------+------------+----------
 A    |         1 |       4 | {A}         |                1 |        1.1 |      0.9
 B    |         2 |       4 | {B}         |                1 |        1.2 |      1.3
 C    |         3 |       5 | {C}         |                1 |        1.8 |      2.4
 D    |         4 |       5 | {A,B,D}     |                3 |        3.5 |      3.5
 E    |         5 |         | {A,B,C,D,E} |                5 |        6.4 |      6.8

How can I write this update query?

Note that I'm ultimately more concerned about accumulating accurate length and area totals, and that the path attributes are for debugging. In my real-world case, forwards paths are up to a couple hundred, and I expect reverse paths in the tens of thousands for large and complex catchments.

like image 953
Mike T Avatar asked Feb 24 '15 04:02

Mike T


1 Answers

UPDATE 2: I rewrote the original recursive query so that all accumulation/aggregation is done outside the recursive part. It should perform better than the previous version of this answer. This is very much alike the answer from @a_horse_with_no_name for a similar question.

  WITH 
    RECURSIVE search_graph(edge, from_node, to_node, length, area, start_node) AS
    (
        SELECT edge, from_node, to_node, length, area, from_node AS "start_node"
        FROM tree
        UNION ALL
        SELECT o.edge, o.from_node, o.to_node, o.length, o.area, p.start_node
        FROM tree o
    JOIN search_graph p ON p.from_node = o.to_node
    )
    SELECT array_agg(edge) AS "edges"
       -- ,array_agg(from_node) AS "nodes"
          ,count(edge) AS "edge_count"
          ,sum(length) AS "length_sum"
          ,sum(area) AS "area_sum"
    FROM search_graph
    GROUP BY start_node
    ORDER BY start_node
;

Results are as expected:

 start_node | edges       | edge_count | length_sum |  area_sum
------------+-------------+------------+------------+------------
  1         | {A}         |          1 |        1.1 |       0.9
  2         | {B}         |          1 |        1.2 |       1.3
  3         | {C}         |          1 |        1.8 |       2.4
  4         | {D,B,A}     |          3 |        3.5 |       3.5
  5         | {E,D,C,B,A} |          5 |        6.4 |       6.8
like image 117
Geir Bostad Avatar answered Sep 27 '22 16:09

Geir Bostad