Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find largest common sub-tree in the given two binary search trees?

Two BSTs (Binary Search Trees) are given. How to find largest common sub-tree in the given two binary trees?

EDIT 1: Here is what I have thought:

Let, r1 = current node of 1st tree r2 = current node of 2nd tree

There are some of the cases I think we need to consider:

Case 1 : r1.data < r2.data
     2 subproblems to solve:
     first, check r1 and r2.left 
     second, check r1.right and r2

Case 2 : r1.data > r2.data
     2 subproblems to solve:
       - first, check r1.left and r2
       - second, check r1 and r2.right

Case 3 : r1.data == r2.data
     Again, 2 cases to consider here:
     (a) current node is part of largest common BST
         compute common subtree size rooted at r1 and r2
     (b)current node is NOT part of largest common BST
         2 subproblems to solve:
         first, solve r1.left and r2.left 
         second, solve r1.right and r2.right

I can think of the cases we need to check, but I am not able to code it, as of now. And it is NOT a homework problem. Does it look like?

like image 747
Bhushan Avatar asked Jul 13 '11 04:07

Bhushan


2 Answers

Just hash the children and key of each node and look for duplicates. This would give a linear expected time algorithm. For example, see the following pseudocode, which assumes that there are no hash collisions (dealing with collisions would be straightforward):

ret = -1
// T is a tree node, H is a hash set, and first is a boolean flag
hashTree(T, H, first):
  if (T is null):
    return 0 // leaf case
  h = hash(hashTree(T.left, H, first), hashTree(T.right, H, first), T.key)
  if (first):
    // store hashes of T1's nodes in the set H
    H.insert(h)
  else:
    // check for hashes of T2's nodes in the set H containing T1's nodes
    if H.contains(h):
      ret = max(ret, size(T)) // size is recursive and memoized to get O(n) total time
  return h

H = {}
hashTree(T1, H, true)
hashTree(T2, H, false)
return ret

Note that this is assuming the standard definition of a subtree of a BST, namely that a subtree consists of a node and all of its descendants.

like image 102
jonderry Avatar answered Jan 13 '23 18:01

jonderry


Assuming there are no duplicate values in the trees:

LargestSubtree(Tree tree1, Tree tree2)
    Int bestMatch := 0
    Int bestMatchCount := 0
    For each Node n in tree1  //should iterate breadth-first
        //possible optimization:  we can skip every node that is part of each subtree we find
        Node n2 := BinarySearch(tree2(n.value))
        Int matchCount := CountMatches(n, n2)
        If (matchCount > bestMatchCount)
            bestMatch := n.value
            bestMatchCount := matchCount
        End
    End

    Return ExtractSubtree(BinarySearch(tree1(bestMatch)), BinarySearch(tree2(bestMatch)))
End

CountMatches(Node n1, Node n2)
    If (!n1 || !n2 || n1.value != n2.value)
        Return 0
    End
    Return 1 + CountMatches(n1.left, n2.left) + CountMatches(n1.right, n2.right)
End

ExtractSubtree(Node n1, Node n2)
    If (!n1 || !n2 || n1.value != n2.value)
        Return nil
    End

    Node result := New Node(n1.value)
    result.left := ExtractSubtree(n1.left, n2.left)
    result.right := ExtractSubtree(n1.right, n2.right)
    Return result
End

To briefly explain, this is a brute-force solution to the problem. It does a breadth-first walk of the first tree. For each node, it performs a BinarySearch of the second tree to locate the corresponding node in that tree. Then using those nodes it evaluates the total size of the common subtree rooted there. If the subtree is larger than any previously found subtree, it remembers it for later so that it can construct and return a copy of the largest subtree when the algorithm completes.

This algorithm does not handle duplicate values. It could be extended to do so by using a BinarySearch implementation that returns a list of all nodes with the given value, instead of just a single node. Then the algorithm could iterate this list and evaluate the subtree for each node and then proceed as normal.

The running time of this algorithm is O(n log m) (it traverses n nodes in the first tree, and performs a log m binary-search operation for each one), putting it on par with most common sorting algorithms. The space complexity is O(1) while running (nothing allocated beyond a few temporary variables), and O(n) when it returns its result (because it creates an explicit copy of the subtree, which may not be required depending upon exactly how the algorithm is supposed to express its result). So even this brute-force approach should perform reasonably well, although as noted by other answers an O(n) solution is possible.

There are also possible optimizations that could be applied to this algorithm, such as skipping over any nodes that were contained in a previously evaluated subtree. Because the tree-walk is breadth-first we know than any node that was part of some prior subtree cannot ever be the root of a larger subtree. This could significantly improve the performance of the algorithm in certain cases, but the worst-case running time (two trees with no common subtrees) would still be O(n log m).

like image 25
aroth Avatar answered Jan 13 '23 20:01

aroth