I have a set of n nodes on a (non-binary) tree. I want to find the maximum of the distances between any two of the nodes. (I define the distance between two nodes to be the sum of the distances between those nodes and their lowest common ancestor).
I can easily solve this problem in O(n^2) by just computing the distance between each node to each other node and getting the maximum, however I'm hoping for something better as this is far too slow* for my application scenario.
(Extra info: In my application scenario, these nodes are actually files and the tree is a directory structure. Therefore, the tree is quite shallow (depth < ~10), but it may have 300,000+ nodes. The sets of files can be anywhere between ~3-200 in size. Effectively, I'm trying to figure out how far spread out are my files in each set.)*
Edit: Perhaps I can ask an equivalent question to prompt more answers: Consider a subset of the original tree that only contains the nodes in my set and the nodes necessary to connect them. Then the question becomes: How do I find the longest simple path in an undirected acyclical graph?
*Edit 2: As didierc pointed out, I actually should be considering sets of folders not files. This makes my sets smaller and the exhaustive approach may be fast enough. Still, it would be beneficial to see a faster solution, and I'm curious to see if there is one.
The distance between two nodes can be obtained in terms of lowest common ancestor. Following is the formula. Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca) 'n1' and 'n2' are the two given keys 'root' is root of given Binary Tree.
The distance between two nodes is defined as the total number of edges in the shortest path from one node to other. For example, consider the binary tree. The distance between node 7 and node 6 is 3. This problem is a standard application of the lowest common ancestor of given nodes.
If binary tree has height h, maximum number of nodes will be when all levels are completely full. Total number of nodes will be 2^0 + 2^1 + …. 2^h = 2^(h+1)-1. For example, the binary tree shown in Figure 2(b) with height 2 has 2^(2+1)-1 = 7 nodes.
Starts from the first node and then keep hopping from the current set of nodes until you reach the target. The point of visited-node list is to prevent visiting the visited nodes, resulting in a loop. And to get shortest distance, it's no use to make a revisit as it always makes the distance of resulting path longer.
Your problem is also known as finding the diameter of a tree: among all shortest paths between pairs of nodes, you're seeking the longest.
Denote the diameter of a tree S by d(S), and its height by h(S).
The two most distant nodes in a tree S with subtrees S1...Sd can be either under one of its subtrees, or they might span two subtrees. In the first case, when the two most distant nodes are under subtree Si, d(S) is just d(Si). In the second case, when the two most distance nodes span two subtrees, say Si and Sj, their distance is h(Si) + h(Sj) + 2 because the two nodes must be the two deepest nodes in each subtree, plus two more edges to join the two subtrees. In fact, in this second case, Si and Sj must be the heighest and second heighest subtree of S.
An O(n) algorithm would proceed as follows
1. recursively compute d(S1)...d(Sd) and h(S1)...h(Sd) of the subtrees of S.
2. denote by Si be the deepest subtree and Sj the second deepest subtree
3. return max(d(S1), ..., d(Sd), h(Si)+h(Sj)+2)
Lines 2 and 3 each take O(d) time to compute. But each node is examined only once by these lines, so in recursion, this takes a total of O(n).
I have a simple O(n) greedy algorithm for this interesting problem.
According to the Theorem in the proof, we can solve another more challenging problem: For each vertice in the tree, calculate who is fareast vertice to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With