Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding elements in a BST that sum up to a provided value

I'm trying to get an approach to solve the problem

Find two elements in balanced BST which sums to a given a value.

Constraints Time O(n) and space O(logn).

I was wondering whether the following algorithm would be valid.

int[] findSum(Node root, int sum){
   int[] sumArray;
   for (int i=0;i<sum;i++) {
      if (root.contains(i) && root.contains(sum-i)) {
         sumArray[0] = i;
         sumArray[1] = sum-i;
      }
   }
}

I understand that my approach could be wrong . I would appreciate any feedback /correction to my pseudocode / better algorithms

like image 368
seeker Avatar asked Apr 23 '13 23:04

seeker


People also ask

How do you find the sum of levels in BST?

Approach: Find the height of the given binary tree then the number of levels in the tree will be levels = height + 1. Now create an array sum[] of size levels where sum[i] will store the sum of all the nodes at the ith level.

How do you balance a BST?

Creating a Balanced BST First of all, let's think about the best node to put as the root. Since we need the tree to be balanced, we must put the middle value as the root. After that, we can add the values before the middle to the left of the tree. Therefore, all smaller values will be added to the left subtree.


1 Answers

I believe that the approach you have will work, but does not have the appropriate time constraints.

Let's start off by analyzing the complexity of this algorithm. Note that there are two different parameters to take into consideration here. First, there's the total number of elements in the BST. If you make the BST larger or smaller, it will take more or less time for the algorithm to complete. Let's call this number n. Second, there's the total number you want the values to sum up to. Let's call that value U.

So let's see what your current algorithm does. Right now, it iterates a loop O(U) times, on each iteration checking whether two values exist within the BST. Each BST lookup takes time O(log n), so the total amount of work your algorithm does is O(U log n). However, your algorithm only uses constant space (that is, space O(1)).

Unfortunately, this runtime is not at all good. If the target number is very large (say, 1,000,000,000), then your algorithm is going to take a very long time to finish because U is going to be very large.

So the question now is how you can solve this problem more efficiently. Fortunately, there's a very cute algorithm we can use to solve the problem, and it will leverage the fact that the elements of the BST are stored in sorted order.

Let's begin by solving a similar problem that is a bit different from the one you're posing. Suppose that instead of giving you a BST and a target number, I give you a sorted array and a target number and then ask the same question: are there two numbers in this sorted array that sum up to the target? For example, I might give you this array:

 0  1  3  6  8  9  10  14  19

Let's suppose you wanted to know if two numbers in this array sum up to 16. How might you do this?

You could try the same approach you had before: check if the array contains 0 and 16, 1 and 15, 2 and 14, etc. until you found a pair or ran out of values to check. Checking whether an element exists in a sorted array takes time O(log n) using binary search, so this algorithm still takes O(U log n) time. (You could conceivably speed this up using interpolation search if you knew that the values were nicely distributed, which would give O(U log log n) runtime on expectation, but that large U term is still a problem).

So let's consider a different approach. Fundamentally, what you're doing requires you to explicitly enumerate all pairs that sum up to U. However, most of them aren't going to be there, and, in fact, most of the time neither element in the pair will be there. We could make things a lot faster by using the following algorithm:

  • For each element of x the array, check if U - x is in the array.
  • If so, report success.
  • Otherwise, if no such pair exists, report failure.

This algorithm will require you to look at a total of O(n) elements in the array, each time doing O(log n) work to find the matching pair. In this worst case, this will take O(n log n) time and use O(1) space. This is much better than before if U is a huge number, because there's no longer any dependence on U at all!

However, we can speed things up even further by making a clever observation about the structure of the problem. Let's suppose that we are looking at the number x in the array and do a binary search to look for U - x. If we find it, we're done. If not, we'll find the first number in the array that's greater than U - x. Let's call that number z.

So now suppose that we want to see if a number y could be part of the pair that sums up to U, and moreover, suppose that y is bigger than x. In that case we know that

y + z

> x + z

> x + (U - x)

= U

What this says is that the sum of y and z is greater than U, so it can't possibly be U. What does this mean? Well, we found z by trying to do a binary search for the element that would pair with x to sum up to U. What we've just shown is that if you try to add z to any number in the array that's bigger than x, the total has to exceed U. In other words, z can't pair with anything bigger than x. Similarly, anything bigger than z can't pair with anything bigger than x, because it would have to sum up to something larger than U.

Based on this observation, we can try using a different algorithm. Let's take our array, as before, and see if we can find a pair that sums to 16:

 0  1  3  6  8  9  10  14  19

Let's maintain two pointers - a "left-hand side" pointer lhs and a "right-hand side" pointer rhs:

 0  1  3  6  8  9  10  14  19
 ^                          ^
lhs                        rhs

When we sum up these two numbers, we get back 19, which exceeds U. Now, any pair of numbers that we add up has to have its lower number be at least as large as the lhs number, which is 0. Therefore, if we tried summing up any pair in here that uses the number 19, we know that the sum would be too large. Therefore, we can eliminate from consideration any pair containing 19. This leaves

 0  1  3  6  8  9  10  14  19
 ^                      ^
lhs                    rhs

Now, look at this sum (14), which is too small. Using similar logic as before, we can safely say that any sum in the remaining array that uses 0 must end up giving a sum smaller than 16, because the second number in the sum would be at most 14. Therefore, we can rule out the 0:

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs

We're beginning to see a pattern:

  • If the sum is too small, advance lhs.
  • If the sum is too big, decrement rhs.

Eventually, we will either find a pair that sums up to 16, or lhs and rhs will bump into one another, at which point we're guaranteed that no pair sums up to 16.

Tracing through this algorithm, we get

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs    (sum is 15, too small)

 0  1  3  6  8  9  10  14  19
       ^                ^
      lhs              rhs    (sum is 17, too big)

 0  1  3  6  8  9  10  14  19
       ^            ^
      lhs          rhs        (sum is 13, too small)

 0  1  3  6  8  9  10  14  19
          ^         ^
         lhs       rhs        (sum is 16, we're done!)

Et voila! We've got our answer.

So how fast is this? Well, on each iteration, either lhs goes down or rhs goes up, and the algorithm stops when they meet. This means we do O(n) total iterations. Each iteration does at most constant work, so this will require at most O(1) work per iteration. This gives a total time complexity of O(n).

How about space? Well, we need to store two pointers, each of which takes up O(1) space, so the total space usage is O(1). Great!

But what does this have to do with your problem? The connection is this: at each point in time, this algorithm only needs to remember two numbers in the array. It then needs to be able to advance from one element to the next or from one element to the previous. That's all that it has to do.

So suppose that instead of using a sorted array, you use a binary search tree. Start off with two pointers, one to the smallest node and one to the largest. Once you have this setup, you can simulate the above algorithm, replacing "increment lhs" and "decrement rhs" with "move to the inorder successor of lhs" and "move to the inorder predecessor of rhs." These operations can be implemented so that they use a total of O(log n) space (assuming the tree is balanced) and such that any sequence of n operations of each type take a total of O(n) time total (regardless of whether the tree is balanced). Consequently, if you were to use the above algorithm modified to work on a BST, you would get an algorithm that takes time O(n) and uses only O(log n) space!

The implementation details are a bit tricky, and I'll leave that part as an exercise to you. If you aren't familiar with doing inorder successor and predecessor, there are many good resources available online. They're standard algorithms on BSTs and are super useful to know about. I'll also leave the formal proof of correctness of the array algorithm as an exercise, though it's not too hard.

Hope this helps!

like image 182
templatetypedef Avatar answered Oct 31 '22 11:10

templatetypedef