I'm trying to get an approach to solve the problem <pre class="prettyprint"><code>Find two elements in balanced BST which sums to a given a value. </code></pre> Constraints Time O(n) and space O(logn). I was wondering whether the following algorithm would be valid. <pre class="prettyprint"><code>int[] findSum(Node root, int sum){ int[] sumArray; for (int i=0;i<sum;i++) { if (root.contains(i) && root.contains(sum-i)) { sumArray[0] = i; sumArray[1] = sum-i; } } } </code></pre> I understand that my approach could be wrong . I would appreciate any feedback /correction to my pseudocode / better algorithms

I believe that the approach you have will work, but does not have the appropriate time constraints. Let's start off by analyzing the complexity of this algorithm. Note that there are two different parameters to take into consideration here. First, there's the total number of elements in the BST. If you make the BST larger or smaller, it will take more or less time for the algorithm to complete. Let's call this number n. Second, there's the total number you want the values to sum up to. Let's call that value U. So let's see what your current algorithm does. Right now, it iterates a loop O(U) times, on each iteration checking whether two values exist within the BST. Each BST lookup takes time O(log n), so the total amount of work your algorithm does is O(U log n). However, your algorithm only uses constant space (that is, space O(1)). Unfortunately, this runtime is not at all good. If the target number is very large (say, 1,000,000,000), then your algorithm is going to take a very long time to finish because U is going to be very large. So the question now is how you can solve this problem more efficiently. Fortunately, there's a very cute algorithm we can use to solve the problem, and it will leverage the fact that the elements of the BST are stored in sorted order. Let's begin by solving a similar problem that is a bit different from the one you're posing. Suppose that instead of giving you a BST and a target number, I give you a sorted array and a target number and then ask the same question: are there two numbers in this sorted array that sum up to the target? For example, I might give you this array: <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 </code></pre> Let's suppose you wanted to know if two numbers in this array sum up to 16. How might you do this? You could try the same approach you had before: check if the array contains 0 and 16, 1 and 15, 2 and 14, etc. until you found a pair or ran out of values to check. Checking whether an element exists in a sorted array takes time O(log n) using binary search, so this algorithm still takes O(U log n) time. (You could conceivably speed this up using interpolation search if you knew that the values were nicely distributed, which would give O(U log log n) runtime on expectation, but that large U term is still a problem). So let's consider a different approach. Fundamentally, what you're doing requires you to explicitly enumerate all pairs that sum up to U. However, most of them aren't going to be there, and, in fact, most of the time neither element in the pair will be there. We could make things a lot faster by using the following algorithm: <ul> <li>For each element of x the array, check if U - x is in the array.</li> <li>If so, report success.</li> <li>Otherwise, if no such pair exists, report failure.</li> </ul> This algorithm will require you to look at a total of O(n) elements in the array, each time doing O(log n) work to find the matching pair. In this worst case, this will take O(n log n) time and use O(1) space. This is much better than before if U is a huge number, because there's no longer any dependence on U at all! However, we can speed things up even further by making a clever observation about the structure of the problem. Let's suppose that we are looking at the number x in the array and do a binary search to look for U - x. If we find it, we're done. If not, we'll find the first number in the array that's greater than U - x. Let's call that number z. So now suppose that we want to see if a number y could be part of the pair that sums up to U, and moreover, suppose that y is bigger than x. In that case we know that <blockquote> y + z > x + z > x + (U - x) = U </blockquote> What this says is that the sum of y and z is greater than U, so it can't possibly be U. What does this mean? Well, we found z by trying to do a binary search for the element that would pair with x to sum up to U. What we've just shown is that if you try to add z to any number in the array that's bigger than x, the total has to exceed U. In other words, z can't pair with anything bigger than x. Similarly, anything bigger than z can't pair with anything bigger than x, because it would have to sum up to something larger than U. Based on this observation, we can try using a different algorithm. Let's take our array, as before, and see if we can find a pair that sums to 16: <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 </code></pre> Let's maintain two pointers - a "left-hand side" pointer lhs and a "right-hand side" pointer rhs: <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs </code></pre> When we sum up these two numbers, we get back 19, which exceeds U. Now, any pair of numbers that we add up has to have its lower number be at least as large as the lhs number, which is 0. Therefore, if we tried summing up any pair in here that uses the number 19, we know that the sum would be too large. Therefore, we can eliminate from consideration any pair containing 19. This leaves <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs </code></pre> Now, look at this sum (14), which is too small. Using similar logic as before, we can safely say that any sum in the remaining array that uses 0 must end up giving a sum smaller than 16, because the second number in the sum would be at most 14. Therefore, we can rule out the 0: <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs </code></pre> We're beginning to see a pattern: <ul> <li>If the sum is too small, advance lhs.</li> <li>If the sum is too big, decrement rhs.</li> </ul> Eventually, we will either find a pair that sums up to 16, or lhs and rhs will bump into one another, at which point we're guaranteed that no pair sums up to 16. Tracing through this algorithm, we get <pre class="prettyprint"><code> 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs (sum is 15, too small) 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs (sum is 17, too big) 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs (sum is 13, too small) 0 1 3 6 8 9 10 14 19 ^ ^ lhs rhs (sum is 16, we're done!) </code></pre> Et voila! We've got our answer. So how fast is this? Well, on each iteration, either lhs goes down or rhs goes up, and the algorithm stops when they meet. This means we do O(n) total iterations. Each iteration does at most constant work, so this will require at most O(1) work per iteration. This gives a total time complexity of O(n). How about space? Well, we need to store two pointers, each of which takes up O(1) space, so the total space usage is O(1). Great! But what does this have to do with your problem? The connection is this: at each point in time, this algorithm only needs to remember two numbers in the array. It then needs to be able to advance from one element to the next or from one element to the previous. That's all that it has to do. So suppose that instead of using a sorted array, you use a binary search tree. Start off with two pointers, one to the smallest node and one to the largest. Once you have this setup, you can simulate the above algorithm, replacing "increment lhs" and "decrement rhs" with "move to the inorder successor of lhs" and "move to the inorder predecessor of rhs." These operations can be implemented so that they use a total of O(log n) space (assuming the tree is balanced) and such that any sequence of n operations of each type take a total of O(n) time total (regardless of whether the tree is balanced). Consequently, if you were to use the above algorithm modified to work on a BST, you would get an algorithm that takes time O(n) and uses only O(log n) space! The implementation details are a bit tricky, and I'll leave that part as an exercise to you. If you aren't familiar with doing inorder successor and predecessor, there are many good resources available online. They're standard algorithms on BSTs and are super useful to know about. I'll also leave the formal proof of correctness of the array algorithm as an exercise, though it's not too hard. Hope this helps!

Finding elements in a BST that sum up to a provided value

Tags:

java

algorithm

big-o

data-structures

binary-search-tree

I'm trying to get an approach to solve the problem

Find two elements in balanced BST which sums to a given a value.

Constraints Time O(n) and space O(logn).

I was wondering whether the following algorithm would be valid.

int[] findSum(Node root, int sum){
   int[] sumArray;
   for (int i=0;i<sum;i++) {
      if (root.contains(i) && root.contains(sum-i)) {
         sumArray[0] = i;
         sumArray[1] = sum-i;
      }
   }
}

I understand that my approach could be wrong . I would appreciate any feedback /correction to my pseudocode / better algorithms

368

asked Apr 23 '13 23:04

seeker

1 Answers

I believe that the approach you have will work, but does not have the appropriate time constraints.

Let's start off by analyzing the complexity of this algorithm. Note that there are two different parameters to take into consideration here. First, there's the total number of elements in the BST. If you make the BST larger or smaller, it will take more or less time for the algorithm to complete. Let's call this number n. Second, there's the total number you want the values to sum up to. Let's call that value U.

So let's see what your current algorithm does. Right now, it iterates a loop O(U) times, on each iteration checking whether two values exist within the BST. Each BST lookup takes time O(log n), so the total amount of work your algorithm does is O(U log n). However, your algorithm only uses constant space (that is, space O(1)).

Unfortunately, this runtime is not at all good. If the target number is very large (say, 1,000,000,000), then your algorithm is going to take a very long time to finish because U is going to be very large.

So the question now is how you can solve this problem more efficiently. Fortunately, there's a very cute algorithm we can use to solve the problem, and it will leverage the fact that the elements of the BST are stored in sorted order.

Let's begin by solving a similar problem that is a bit different from the one you're posing. Suppose that instead of giving you a BST and a target number, I give you a sorted array and a target number and then ask the same question: are there two numbers in this sorted array that sum up to the target? For example, I might give you this array:

 0  1  3  6  8  9  10  14  19

Let's suppose you wanted to know if two numbers in this array sum up to 16. How might you do this?

You could try the same approach you had before: check if the array contains 0 and 16, 1 and 15, 2 and 14, etc. until you found a pair or ran out of values to check. Checking whether an element exists in a sorted array takes time O(log n) using binary search, so this algorithm still takes O(U log n) time. (You could conceivably speed this up using interpolation search if you knew that the values were nicely distributed, which would give O(U log log n) runtime on expectation, but that large U term is still a problem).

So let's consider a different approach. Fundamentally, what you're doing requires you to explicitly enumerate all pairs that sum up to U. However, most of them aren't going to be there, and, in fact, most of the time neither element in the pair will be there. We could make things a lot faster by using the following algorithm:

For each element of x the array, check if U - x is in the array.
If so, report success.
Otherwise, if no such pair exists, report failure.

This algorithm will require you to look at a total of O(n) elements in the array, each time doing O(log n) work to find the matching pair. In this worst case, this will take O(n log n) time and use O(1) space. This is much better than before if U is a huge number, because there's no longer any dependence on U at all!

However, we can speed things up even further by making a clever observation about the structure of the problem. Let's suppose that we are looking at the number x in the array and do a binary search to look for U - x. If we find it, we're done. If not, we'll find the first number in the array that's greater than U - x. Let's call that number z.

So now suppose that we want to see if a number y could be part of the pair that sums up to U, and moreover, suppose that y is bigger than x. In that case we know that

y + z

> x + z

> x + (U - x)

= U

What this says is that the sum of y and z is greater than U, so it can't possibly be U. What does this mean? Well, we found z by trying to do a binary search for the element that would pair with x to sum up to U. What we've just shown is that if you try to add z to any number in the array that's bigger than x, the total has to exceed U. In other words, z can't pair with anything bigger than x. Similarly, anything bigger than z can't pair with anything bigger than x, because it would have to sum up to something larger than U.

Based on this observation, we can try using a different algorithm. Let's take our array, as before, and see if we can find a pair that sums to 16:

 0  1  3  6  8  9  10  14  19

Let's maintain two pointers - a "left-hand side" pointer lhs and a "right-hand side" pointer rhs:

 0  1  3  6  8  9  10  14  19
 ^                          ^
lhs                        rhs

When we sum up these two numbers, we get back 19, which exceeds U. Now, any pair of numbers that we add up has to have its lower number be at least as large as the lhs number, which is 0. Therefore, if we tried summing up any pair in here that uses the number 19, we know that the sum would be too large. Therefore, we can eliminate from consideration any pair containing 19. This leaves

 0  1  3  6  8  9  10  14  19
 ^                      ^
lhs                    rhs

Now, look at this sum (14), which is too small. Using similar logic as before, we can safely say that any sum in the remaining array that uses 0 must end up giving a sum smaller than 16, because the second number in the sum would be at most 14. Therefore, we can rule out the 0:

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs

We're beginning to see a pattern:

If the sum is too small, advance lhs.
If the sum is too big, decrement rhs.

Eventually, we will either find a pair that sums up to 16, or lhs and rhs will bump into one another, at which point we're guaranteed that no pair sums up to 16.

Tracing through this algorithm, we get

 0  1  3  6  8  9  10  14  19
    ^                   ^
   lhs                 rhs    (sum is 15, too small)

 0  1  3  6  8  9  10  14  19
       ^                ^
      lhs              rhs    (sum is 17, too big)

 0  1  3  6  8  9  10  14  19
       ^            ^
      lhs          rhs        (sum is 13, too small)

 0  1  3  6  8  9  10  14  19
          ^         ^
         lhs       rhs        (sum is 16, we're done!)

Et voila! We've got our answer.

So how fast is this? Well, on each iteration, either lhs goes down or rhs goes up, and the algorithm stops when they meet. This means we do O(n) total iterations. Each iteration does at most constant work, so this will require at most O(1) work per iteration. This gives a total time complexity of O(n).

How about space? Well, we need to store two pointers, each of which takes up O(1) space, so the total space usage is O(1). Great!

But what does this have to do with your problem? The connection is this: at each point in time, this algorithm only needs to remember two numbers in the array. It then needs to be able to advance from one element to the next or from one element to the previous. That's all that it has to do.

So suppose that instead of using a sorted array, you use a binary search tree. Start off with two pointers, one to the smallest node and one to the largest. Once you have this setup, you can simulate the above algorithm, replacing "increment lhs" and "decrement rhs" with "move to the inorder successor of lhs" and "move to the inorder predecessor of rhs." These operations can be implemented so that they use a total of O(log n) space (assuming the tree is balanced) and such that any sequence of n operations of each type take a total of O(n) time total (regardless of whether the tree is balanced). Consequently, if you were to use the above algorithm modified to work on a BST, you would get an algorithm that takes time O(n) and uses only O(log n) space!

The implementation details are a bit tricky, and I'll leave that part as an exercise to you. If you aren't familiar with doing inorder successor and predecessor, there are many good resources available online. They're standard algorithms on BSTs and are super useful to know about. I'll also leave the formal proof of correctness of the array algorithm as an exercise, though it's not too hard.

Hope this helps!

182

answered Oct 31 '22 11:10

templatetypedef

Related questions
                            
                                error:no resource identifier found for attribute"showAsAction" in package android
                            
                                How to properly return an image in the response in Spring app?
                            
                                Eclipse code formatter: how to remove its effects?
                            
                                How to fix org.apache.commons.compress.archivers.zip.UnsupportedZipFeatureException(epub mimetype)?
                            
                                Does an else statement slow compile time/run speed? (In situations where one could be avoided)
                            
                                Android - Get List of supported Locales
                            
                                Connect "computer-database-jpa" Play 2.1 sample application with MySQL
                            
                                Integrate Spring and Hibernate in Play framework 2.1?
                            
                                Eclipse get hang after some time with error of java
                            
                                How can I sort list of object based on one of the object's variables
                            
                                How to Write PDF using Java
                            
                                Using a servlet, how do you download multiple files from a database and zip them for client download
                            
                                Drawing a bold line with drawLine() when extending Canvas
                            
                                cxf puts source files in target but compiler can't see them?
                            
                                How to make a csv in a byte[]?
                            
                                Why isn't there guava Objects.equal(Object, Object) to primitive types?
                            
                                Hibernate or JPA annotations to use
                            
                                How to solve SAXException: Invalid element in
                            
                                How do I check if ringtone pointed by uri exists?
                            
                                Search on a particular line using Regular Expression in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With