Given a directed graph, with weight assigned to every node.
Start from any node A, there will be a set of nodes which could be reached from A.
Define SUM as total weight of this set.
Question:
How to calculate SUM for every node in this graph efficiently?
The graph may contain millions of nodes.
Update 1:
The graph data structure is composed of the set of start nodes, and each node have a set of point-to nodes.
What I tried:
Calculate the descendants of every node recursively, and calculate the SUM according to descendant set. This recursion is very inefficient since I have to do set union, uteration operation for a lot of times. Further, I tried to cache the descendant set at each node. However, it easily runs out of memory.
So, is there any other solution?
Update 2:
Example graph, edges are all directed, upper nodes points to lower nodes
Result should be
SUM(1)=55
SUM(2)=35
SUM(3)=41
SUM(4)=19
SUM(5)=22
SUM(6)=25
...
Find the SCC's of the graph (Tarjan's algorithm, or a double DFS run).
For each SCC calculate the sum of it's node weights, denote this value by PARTIAL-SUM.
Iterate over the SCC's in reverse topological order; for every node in each SCC, its SUM will be the sum of all the SUM values of adjacent SCC's plus it's own PARTIAL-SUM value.
Linear running time O(E+V)
since finding SCC's is linear, topological sort is linear, and the summation is linear since we visit each SCC at most once and each branch at most once.
EDIT
As was pointed out in the comments by tzkuzzy parallel paths pose a problem. That is easily solved by a simple DFS run on the SCC graph. On any cross-edge we simply take the already visited node up the DFS tree until we reach a not-fully-searched parent, this pair of nodes (the visited on the bottom, and the ancestor) has two distinct paths between them, we make a list for each node of such descendants, and on summation merely subtract their PARTIAL-SUM value.
So if:
u
/ \
\ /
w
Our DFS will pick up the cross edge connecting from a child node of u
to w
, and trace back to u
(for those familiar with the typical DFS taught in schools the easiest explanation is that u
is characterised as the first grey ancestor of w
), so we add w
to the list we maintain on u
.
Then when we sum each SCC's adjacent SCC's as described, we add an extra step where we loop over the list mentioned and simply subtract PARTIAL-SUM values.
The DFS itself is still linear. Backtracking from a node to an ancestor can be linear if we cache the results (that way we don't traverse the same edge more than once). And the additional work in the summation is at most O(V)
, so we haven't changed the running time.
EDIT
Inclusion-exclusion makes this more difficult than I first thought. This solution isn't complete and doesn't work. A simple BFS for each node is more expensive but easier and will work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With