Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Compute Space Complexity for Binary SubTree Finding

This problem is from the book Cracking the Coding Interview. I have trouble understanding the space complexity of the solution.

Problem:
You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.

Solution (in Java):

public static boolean containsTree(TreeNode t1, TreeNode t2) {
    if (t2 == null)
        return true; // The empty tree is a subtree of every tree.
    else
        return subTree(t1, t2);
}

/* Checks if the binary tree rooted at r1 contains the binary tree 
 * rooted at r2 as a subtree somewhere within it.
 */
public static boolean subTree(TreeNode r1, TreeNode r2) {
    if (r1 == null)
        return false; // big tree empty & subtree still not found.
    if (r1.data == r2.data) {
        if (matchTree(r1,r2)) return true;
    }
    return (subTree(r1.left, r2) || subTree(r1.right, r2)); 
}

/* Checks if the binary tree rooted at r1 contains the 
 * binary tree rooted at r2 as a subtree starting at r1.
 */
public static boolean matchTree(TreeNode r1, TreeNode r2) {
    if (r2 == null && r1 == null) 
        return true; // nothing left in the subtree
    if (r1 == null || r2 == null) 
        return false; //  big tree empty & subtree still not found
    if (r1.data != r2.data) 
        return false;  // data doesn’t match
    return (matchTree(r1.left, r2.left) && 
            matchTree(r1.right, r2.right));
}

The book says the space complexity of this solution is O(log(n) +log(m)) where m is the number of nodes in T1 (larger tree) and n number of nodes in T2.

To me, it appears that the solution has O(log(m)*log(n)) space complexity since "subtree" function has log(n) recursive calls and each recursive call executes "matchTree" function which triggers log(m) recursive calls.

Why is this solution O(log(n) + log(m)) complexity?

like image 665
jjwest Avatar asked Jun 07 '15 04:06

jjwest


1 Answers

Since we're not creating any objects on the heap, the space complexity is the size of the stack. So the question is not how many total calls occur, but how big the stack can grow.

containsTree() can only call subTree(), subTree() can call itself or matchTree(), and matchTree() can only call itself. So at any point where matchTree() has been called, the stack looks like this:

[containsTree] [subTree] ... [subTree] [matchTree] ... [matchTree]

This is why you don't multiply the space complexities here: while each call to subTree() can call matchTree(), those calls to matchTree() leave the stack before subTree() continues recursing.

Analysis along the lines of the "correct answer"

If the question doesn't specify if the trees are balanced, then a real worst-case analysis would assume they might not be. However, you and the book are assuming they are. We can set aside that question for later by saying the depth of T1 is c, and the depth of T2 is d. c is O(log(m)) if T1 is balanced, and O(m) otherwise. Same thing for T2's d.

Worst case for matchTree() is O(d), because the furthest it could recurse would be the height of T2.

Worst case for subTree() is O(c) for its recursion, because the furthest it could recurse would be the height of T1, plus the cost of calling matchTree(), for a total of O(c+d).

And containsTree() just adds a constant on top of calling subTree(), so that doesn't change the space complexity.

So if both T1 and T2 are balanced, by replacing c and d you can see that O(log(m)+log(n)) seems reasonable.

Problems with the "correct answer"

Like I said before, it's not right to assume binary trees are balanced until you know for a fact that they are. So a better answer might be O(m+n).

But wait! The question states that the size of T2 is less than the size of T1. That means that n is O(m), and log(n) is O(log(m)). So why have we been wasting time worrying about n?

If the trees are balanced, the space complexity is simply O(log(m)). In the general case where you don't know what's balanced or not, the real answer should be O(m), the size of the larger tree.

like image 153
Dan Getz Avatar answered Nov 07 '22 21:11

Dan Getz