Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the binary search algorithm works for this 1D" Peak finding" problem?

I was looking into MIT's open courseware first lecture on an introduction to algorithms and there is something that is not terribly obvious to me. You cant start watching the lecture at 24:30 here and the lecture notes with all the details of the 1D peak problem definition and solution in here

The problem goes:

for an array of "n" integer elements find a peak

and gives an example array of size 8: example = [6,7,4,3,2,1,4,5]

Definition of a peak For the example array above example[1] and example[7] are "peaks" because those numbers are greater or equal than their adjacent element and a special condition for the last element of the array applies that it only needs to be greater than or equal to the element preceding it. that is:

example[1] >= example[0] && example[1] >= example[2] #=> is true and therefore a peak
example[7] >= example[6] #=> is true and therefore a peak

Important observations From this example we can appreciate that the array may be unsorted, that it may contain duplicates and that it may contain more than one peak and to my interpretation it may even not contain any single peak.

So far so good but my troubles started when he goes to argue that a definition of splitting the array in binary search tree makes to find a peak, ** that may be terribly obvious to everyone in that class but not for me, it seems arbitrary or I failed to understand something very important**

The professor goes to define in pseudo-code a binary search algorithm to find a peak:

pseudo code definition of a binary search algorithm

My questions/concerns

  1. Given the condition above in A why go to the left? instead of the right?
  2. Given the condition above in B why go to the right? instead of the left?
  3. The binary search algorithm assumes we start from a sorted array, so how come it makes sense to apply it to data that may be unsorted?
  4. Does the solution guarantees that for cases when there is only one "peak" we don't initially discard the half that contains the peak? If so how/why?

Since the array can be unsorted and it may contain duplicates I don't understand where is the guarantee that if conditions in either A or B hold to be true it would make sense to go look for either right or left, it seems arbitrary to me and that if you choose wrong you could discard the half of the array that actually might have the only peak

Am I missing something important? If so what?

like image 718
Oscar Ortiz García Avatar asked Oct 15 '22 02:10

Oscar Ortiz García


1 Answers

Given the condition above in A why go to the left? instead of the right?

If you would go to the right (without first checking condition B), there is a small probability that the values at the right will keep going down (from left to right), and you would not find a peak there.

At the left side, however, you know that you cannot have that situation, as you have found at least one value that is higher (the neighbour) and could potentially even be a peak. Here is how you can prove that a peak exists at the left side (this is not a description of the algorithm; just a way to prove it):

If the immediate neighbour is not a peak, then possibly the next one to its left is. If not, then possibly the next one to its left....etc. This series will end when finding a peak or arriving at the left most value. If none of the others were peaks, then this one must be it. This only happens when the values never decreased while looking further to the left.

In short, whatever the situation at the left side, there is a peak somewhere there at that side, and that is all we need to know when choosing a side.

Given the condition above in B why go to the right? instead of the left?

This is of course the same reasoning, but mirrored.

Note that you can decide to first check condition B and only then A. When both conditions are true at the same time, you can actually choose which side to go. This is where you got the feeling from that the choice looks "arbitrary". It is indeed arbitrary when both conditions A and B are true.

But also think about the case where one of A and B is true and the other false. If you would go the wrong (downward) way, you have no guarantee that values would ever go up in that direction. And so there is a small probability that there is no peak on that side.

Of course, there still could be a peak on that side, but since we are sure there is one on the other side, it is wise to go for certainty. We don't care about potentially discarding some peaks, as we only need to find one peak.

The binary search algorithm assumes we start from a sorted array, so how come it makes sense to apply it to data that may be unsorted?

Binary search for a particular value would only work in a sorted array, but here we are not looking for a particular value. The conditions of the value we are looking for are less strict. Instead of a particular value, we will be happy with any value that is a local peak.

like image 188
trincot Avatar answered Oct 19 '22 02:10

trincot