This earlier question asked how many ways there were to insert the values 1 - 7 into a binary search tree that would result in the following tree:
4 / \ 2 6 / \ / \ 1 3 5 7
(The answer is 80, by the way).
Suppose more generally that you're given an arbitrary BST holding some set of values and want to know how many possible ways there are to insert those values into a BST that would end up producing the resulting tree. Is there an efficient algorithm for determining this?
Thanks!
If we repeatedly insert a sorted sequence of values to form a BST, we obtain a completely skewed BST. The height of such a tree is n - 1 if the tree has n nodes. Thus, the worst case complexity of searching or inserting an element into a BST having n nodes is O(n).
A binary search of 10,000 items requires at most 14 comparisons. Thus, in terms of the number of comparisons, binary search is much more efficient than sequential search. However, in order to use the binary search approach, the items must be presorted.
We can solve this using a clever recursive algorithm.
As our base case, if you are given an empty tree, there is exactly one way to build that tree - don't insert any values.
For the recursive case, let's suppose that you have a BST that looks like this:
v / \ L R
Here, v is the root, and L and R are the right subtrees, respectively.
If we want to build up this binary search tree, we would have to start off by inserting v first - if we didn't, v wouldn't be the root. After we insert v, we need to insert the elements in an ordering that will cause the subtrees L and R to be rebuilt. The tricky part here is that we don't need to build up all of L before building up R or vice-versa; we could insert some elements from L, then some elements from R, then more elements from L, then more elements from R, etc.
Fortunately, though, there is a useful observation we can make. Suppose that SL is a sequence of elements that, if inserted into a BST, forms the BST L. Similarly, let SR be a sequence of elements that if inserted into a BST form the BST R. If we consider any possible interleaving of the sequences SL and SR, we will end up with a sequence of elements that, if inserted into a BST containing just v, will build up the tree
v / \ L R
As an example, consider this tree:
4 / \ 2 6 / \ / \ 1 3 5 7
One possible sequence that builds the left subtree (holding 1, 2, 3) is 2, 1, 3. One possible sequence that builds the right subtree is 6, 5, 7. Any possible interleaving of those sequences, when inserted into a BST containing just the root node 4, will end up building out the above BST. For example, any of these sequences will work:
2, 1, 3, 6, 5, 7 2, 6, 1, 5, 3, 7 6, 5, 2, 1, 3, 7 ...
Since given any sequences SL and SR that build up L and R we can interleave them in any order, we can write out a nice formula for the number of sequences that will build out a tree with v as its root, L as its left subtree, and R as its right subtree:
# ways = (# interleaves of SL and SR) × (# possible SLs) × (# possible SRs)
If you think about it, the last two terms in this product can be found by recursively finding the number of insertion sequences that work for the left and right subtrees. This means that if we can figure out how many possible interleavings there are for the two sequences, then we can determine how many possible insertion sequences will build up a given tree by evaluating the above expression recursively!
So how many interleavings are there? If we're given two sequences of length m and n with no duplicates in them, then we can come up with the number of interleavings of those sequences with the following observation. Consider the sequence
L L L ... L R R R ... R m times n times
Any permutation of this sequence will give back a series of Ls and Rs where the number of Ls is equal to the number of elements in the sequence of length m and the number of Rs is equal to the number of elements in the sequence of length n. We can interpret this sequence as a way of describing how to build up the interleaving - any time we see L, we insert an element from the left sequence, and any time we see an R, we insert an element from the right sequence. For example, consider the sequences 4, 1, 0, 3 and 2, 7. Then the permutation LLRLRL gives the sequence
4, 1, 2, 0, 3, 7 L L R L R L
If we start off with a sequence of m L's and n R's, the number of distinct permutations comes back as
(m + n)! -------- = (m + n) choose m m! n!
This holds because there are (m + n)! possible ways of reordering the Ls and the Rs if they were all distinct. Since they aren't, every ordering is counted m! n! too many times because we can permute the L's indistinguishably and the R's indistinguishably. Another way to think about this is to consider the set {1, 2, 3, ..., m + n} of indices in the interleaving, then to choose m of them to fill with elements from the first sequence, implicitly filling the rest of them with elements from the right sequence.
Okay... we now have a way of determining the number of ways of interleaving two sequences of length m and n. Therefore, we have the following:
# ways = (# interleaves of SL and SR) × (# possible SLs) × (# possible SRs)
= ((m + n) choose n) × (# possible SLs) × (# possible SRs)
Where m is the number of elements in the left subtree and n is the number of elements in the right subtree. Yay!
We can therefore write out pseudocode for this algorithm:
function countInsertionOrderings(Tree t) { if t is null, return 1; otherwise: let m = numElementsIn(t.left); let n = numElementsIn(t.right); return choose(m + n, n) * countInsertionOrderings(t.left) * countInsertionOrderings(t.right); }
This algorithm performs a total of O(n) multiplications, where n is the number of nodes in the tree, and visits every node exactly once (assuming the number of elements in the BST are cached at each node). This does not mean the algorithm runs in time O(n), though, because the work required to multiply these numbers together will grow rapidly as the numbers get larger and larger.
Hope this helps!
This is an interesting question. I implemented this idea in python and this recursion plus memorization has pretty good performance. The "seq" is the input list of unique integers
def answer(seq): from math import factorial BStore = dict() # store binomsial coefficient Store = dict() # store the recusion step def TreeGenerator(a,L): # for a given number a and a list L, this functions returns the left tree (a list) and the right tree (a list) LT = [] RT = [] for i in L: if i<a: LT.append(i) else: RT.append(i) solution = [LT,RT] return(solution) def binomial_coefficient(n, k): d = n - k if d < 0: return(0) return(factorial(n) / factorial(k) / factorial(d)) def binomial(n, k): if (n, k) in BStore: return(BStore[n, k]) else: solution = binomial_coefficient(n, k) BStore[n, k] = solution return(solution) def R(S): # recursion starts here if tuple(S) in Store: return(Store[tuple(S)]) else: if len(S)<2: results = 1 else: a = S[0] S = S[1:] Tree = TreeGenerator(a,S) R1 = R(Tree[0]) R2 = R(Tree[1]) results = binomial(len(R1)+len(R2), len(R2))*R1*R2 Store[tuple(S)] = results return(results) return(R(seq))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With