I have a connected set of edges with unique nodes. They are connected using a parent node. Consider the following example code and illustration:
CREATE TABLE network (
node integer PRIMARY KEY,
parent integer REFERENCES network(node),
length numeric NOT NULL
);
CREATE INDEX ON network (parent);
INSERT INTO network (node, parent, length) VALUES
(1, NULL, 1.3),
(2, 1, 1.2),
(3, 2, 0.9),
(4, 3, 1.4),
(5, 4, 1.6),
(6, 2, 1.5),
(7, NULL, 1.0);
Visually, two groups of edges can be identified. How can the two groups be identified using PostgreSQL 9.1, and length
summed? The expected result is shown:
edges_in_group | total_edges | total_length
----------------+-------------+--------------
{1,2,3,4,5,6} | 6 | 7.9
{7} | 1 | 1.0
(2 rows)
I don't even know where to begin. Do I need a custom aggregate or window function? Could I use WITH RECURSIVE
to iteratively collect edges that connect? My real world case is a stream network of 245,000 edges. I expect the maximum number of edges_in_group
to be less than 200, and a couple hundred aggregated groups (rows).
A recursive query is the way to go:
with recursive tree as (
select node, parent, length, node as root_id
from network
where parent is null
union all
select c.node, c.parent, c.length, p.root_id
from network c
join tree p on p.node = c.parent
)
select root_id, array_agg(node) as edges_in_group, sum(length) as total_length
from tree
group by root_id;
The important thing is to keep the id of the root node in each recursion, so that you can group by that id in the final result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With