Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding number of nodes within a certain distance in a rooted tree

Tags:

algorithm

In a rooted and weighted tree, how can you find the number of nodes within a certain distance from each node? You only need to consider down edges, e.g. nodes going down from the root. Keep in mind each edge has a weight.

I can do this in O(N^2) time using a DFS from each node and keeping track of the distance traveled, but with N >= 100000 it's a bit slow. I'm pretty sure you could easily solve it with unweighted edges with DP, but anyone know how to solve this one quickly? (Less than N^2)

like image 454
Forza Avatar asked Dec 16 '12 20:12

Forza


1 Answers

It's possible to improve my previous answer to O(nlog d) time and O(n) space by making use of the following observation:

The number of sufficiently-close nodes at a given node v is the sum of the numbers of sufficiently-close nodes of each of its children, less the number of nodes that have just become insufficiently-close.

Let's call the distance threshold m, and the distance on the edge between two adjacent nodes u and v d(u, v).

Every node has a single ancestor that is the first ancestor to miss out

For each node v, we will maintain a count, c(v), that is initially 0.

For any node v, consider the chain of ancestors from v's parent up to the root. Call the ith node in this chain a(v, i). Notice that v needs to be counted as sufficiently close in some number i >= 0 of the first nodes in this chain, and in no other nodes. If we are able to quickly find i, then we can simply decrement c(a(v, i+1)) (bringing it (possibly further) below 0), so that when the counts of a(v, i+1)'s children are added to it in a later pass, v is correctly excluded from being counted. Provided we calculate fully accurate counts for all children of a node v before adding them to c(v), any such exclusions are correctly "propagated" to parent counts.

The tricky part is finding i efficiently. Call the sum of the distances of the first j >= 0 edges on the path from v to the root s(v, j), and call the list of all depth(v)+1 of these path lengths, listed in increasing order, s(v). What we want to do is binary-search the list of path lengths s(v) for the first entry greater than the threshold m: this would find i+1 in log(d) time. The problem is constructing s(v). We could easily build it using a running total from v up to the root -- but that would require O(d) time per node, nullifying any time improvement. We need a way to construct s(v) from s(parent(v)) in constant time, but the problem is that as we recurse from a node v to its child u, the path lengths grow "the wrong way": every path length x needs to become x + d(u, v), and a new path length of 0 needs to be added at the beginning. This appears to require O(d) updates, but a trick gets around the problem...

Finding i quickly

The solution is to calculate, at each node v, the total path length t(v) of all edges on the path from v to the root. This is easily done in constant time per node: t(v) = t(parent(v)) + d(v, parent(v)). We can then form s(v) by prepending -t to the beginning of s(parent(v)), and when performing the binary search, consider each element s(v, j) to represent s(v, j) + t (or equivalently, binary search for m - t instead of m). The insertion of -t at the start can be achieved in O(1) time by having a child u of a node v share v's path length array, with s(u) considered to begin one memory location before s(v). All path length arrays are "right-justified" inside a single memory buffer of size d+1 -- specifically, nodes at depth k will have their path length array begin at offset d-k inside the buffer to allow room for their descendant nodes to prepend entries. The array sharing means that sibling nodes will overwrite each other's path lengths, but this is not a problem: we only need the values in s(v) to remain valid while v and v's descendants are processed in the preorder DFS.

In this way we gain the effect of O(d) path length increases in O(1) time. Thus the total time required to find i at a given node is O(1) (to build s(v)) plus O(log d) (to find i using the modified binary search) = O(log d). A single preorder DFS pass is used to find and decrement the appropriate ancestor's count for each node; a postorder DFS pass then sums child counts into parent counts. These two passes can be combined into a single pass over the nodes that performs operations both before and after recursing.

like image 144
j_random_hacker Avatar answered Oct 30 '22 13:10

j_random_hacker