Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the max distance between a set of nodes on a tree?

I have a set of n nodes on a (non-binary) tree. I want to find the maximum of the distances between any two of the nodes. (I define the distance between two nodes to be the sum of the distances between those nodes and their lowest common ancestor).

I can easily solve this problem in O(n^2) by just computing the distance between each node to each other node and getting the maximum, however I'm hoping for something better as this is far too slow* for my application scenario.

(Extra info: In my application scenario, these nodes are actually files and the tree is a directory structure. Therefore, the tree is quite shallow (depth < ~10), but it may have 300,000+ nodes. The sets of files can be anywhere between ~3-200 in size. Effectively, I'm trying to figure out how far spread out are my files in each set.)*

Edit: Perhaps I can ask an equivalent question to prompt more answers: Consider a subset of the original tree that only contains the nodes in my set and the nodes necessary to connect them. Then the question becomes: How do I find the longest simple path in an undirected acyclical graph?

*Edit 2: As didierc pointed out, I actually should be considering sets of folders not files. This makes my sets smaller and the exhaustive approach may be fast enough. Still, it would be beneficial to see a faster solution, and I'm curious to see if there is one.

like image 398
Daryl Avatar asked Nov 21 '12 20:11

Daryl


People also ask

How do you find the maximum distance between two nodes in a tree?

The distance between two nodes can be obtained in terms of lowest common ancestor. Following is the formula. Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca) 'n1' and 'n2' are the two given keys 'root' is root of given Binary Tree.

What is the distance between nodes?

The distance between two nodes is defined as the total number of edges in the shortest path from one node to other. For example, consider the binary tree. The distance between node 7 and node 6 is 3. This problem is a standard application of the lowest common ancestor of given nodes.

How do you find the maximum number of nodes?

If binary tree has height h, maximum number of nodes will be when all levels are completely full. Total number of nodes will be 2^0 + 2^1 + …. 2^h = 2^(h+1)-1. For example, the binary tree shown in Figure 2(b) with height 2 has 2^(2+1)-1 = 7 nodes.

How do you find the distance between two nodes on a graph?

Starts from the first node and then keep hopping from the current set of nodes until you reach the target. The point of visited-node list is to prevent visiting the visited nodes, resulting in a loop. And to get shortest distance, it's no use to make a revisit as it always makes the distance of resulting path longer.


2 Answers

Your problem is also known as finding the diameter of a tree: among all shortest paths between pairs of nodes, you're seeking the longest.

Denote the diameter of a tree S by d(S), and its height by h(S).

The two most distant nodes in a tree S with subtrees S1...Sd can be either under one of its subtrees, or they might span two subtrees. In the first case, when the two most distant nodes are under subtree Si, d(S) is just d(Si). In the second case, when the two most distance nodes span two subtrees, say Si and Sj, their distance is h(Si) + h(Sj) + 2 because the two nodes must be the two deepest nodes in each subtree, plus two more edges to join the two subtrees. In fact, in this second case, Si and Sj must be the heighest and second heighest subtree of S.

An O(n) algorithm would proceed as follows

Algorithm d(S)

1. recursively compute d(S1)...d(Sd) and h(S1)...h(Sd) of the subtrees of S.
2. denote by Si be the deepest subtree and Sj the second deepest subtree
3. return max(d(S1), ..., d(Sd), h(Si)+h(Sj)+2)

Analysis

Lines 2 and 3 each take O(d) time to compute. But each node is examined only once by these lines, so in recursion, this takes a total of O(n).

like image 148
moos Avatar answered Oct 17 '22 21:10

moos


I have a simple O(n) greedy algorithm for this interesting problem.

Algorithm

  1. Select arbitrary vertice X as the root of the tree, and then find the vertice Y who has the maximum distance with the root X. The complexity of this step is O(n).
  2. Make vertice Y as the new root of the tree, then find the vertice Z who has the maximum distance with the root Y. And the distance between Y and Z is the maximum of the distances in the tree. The complexity of this step is also O(n).
  3. The total complexity of this greedy algorithm is O(n).

Proof

  • It is obviously that Y and Z form one diameter of the tree, and we call Y and Z a pair of corners of the tree.
  • Theorem: For each vertice P in the tree, either Y or Z would be the vertice who has the maximum distance with it.
  • The step 1 of the algorithm is based on Theorem, so we can easily get one corner (Y) of the tree.
  • The second corner Z is found based on the Theorem as well.

Extend

According to the Theorem in the proof, we can solve another more challenging problem: For each vertice in the tree, calculate who is fareast vertice to it.

  • We can find the two corners of the tree in O(n) complexity, and then we can use the Theorem again.
  • From corner Y and Z we do dfs respectively, and for each vertices p[i] we can get the distance to Y and Z(we call them disY[i] and disZ[i]), so the fareast distance for p[i] is max(disY[i], disZ[i]). Due to we just do dfs twice, so we can get the infomation in O(n) complexity.
  • This extended problem can also solved by complexible tree dynamic programming whose complexity is also O(n).
like image 6
Jun HU Avatar answered Oct 17 '22 20:10

Jun HU