Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary search Python Why do we use mid + 1 or mid - 1

I am learning binary search and the sample code uses" low = mid +1 and high = mid -1" but I don't understand why we do not use "low = mid and high = mid " instead?

def binarysearch(sequence, value):
    lo, hi = 0, len(sequence) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if sequence[mid] < value:
            lo = mid + 1
        elif value < sequence[mid]:
            hi = mid - 1
        else:
            return mid
    return None

my_list = [1, 3, 5, 7, 9]
binarysearch(my_list, 3)
like image 875
Mila Avatar asked Feb 13 '26 08:02

Mila


2 Answers

The reason is to avoid overlapping searches; it places the boundary of the search ranges on the items that have not been checked yet.

For instance: if mid is at index 10, the next search left will look at values up to index 9, and the right search from values at index 11.

                    mid
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|                 |    |                           |
<-from 0 to mid-1->    <-- from mid+1 to the end --> 

note that in this example, the boundary values are included        
like image 159
Reblochon Masque Avatar answered Feb 14 '26 22:02

Reblochon Masque


Please let me try to explain why the binary search works. Let's say we've the following sequence A=[-5, 10, 14, 33, 42, 42, 42] and searched_value = 14 we would have:

Iteration 1,     lo = 0, hi = 6, mid = 3     A[mid] > 14
Iteration 2,     lo = 0, hi = 2, mid = 1     A[mid] < 14
Iteration 3,     lo = 2, hi = 2, mid = 2     A[mid] = 14  (found!)

At each iteration of the algorithm we could conclude that lo and hi always contains the position of the searched value, and we'll prove it by induction on the number of iterations:

Inductive Hypothesis: if the searched value is present in the sequence it will always be contained between lo and hi.

Base Case: At iteration 1 lo = 0 and hi = n-1 contains all elements, in consequence if the searched value is present in the sequence it'll be contained between lo and hi, and the invariant is trivially correct.

Inductive Step: At any step if the searched value is contained between lo and hi, it'll continue to be contained between lo and hi at the next iteration. We've 3 possibilities (and here is the answer to the question):

  • If A[mid] = searched_value: In this case the algorithm finishes reporting correctly the position of the searched value in the sequence and the invariant is correct because the searched value was between lo and hi.
  • If A[mid] < searched_value: Knowing that it's a sorted sequence, all the elements between A[lo...mid] < searched_value (including A[mid]), thus we can assign lo=mid+1 (safely searching only in the upper half) and the invariant will still be correct at the next iteration.
  • If A[mid] > searched_value: knowing that it's a sorted sequence, all the elements between A[mid...hi] > searched value (including A[mid]), thus we can assing hi=mid-1 (safely searching only in the lower half) and the invariant will still be correct at the next iteration.

Given that, at each iteration, the algorithm always searches on smaller segments of the sequence, the termination condition is guaranteed because either there's only 1 element that is searched_value or it is different and, at the next iteration, the algorithm is going to report that such element isn't present in the sequence.

In consequence the algorithm is proved correct (and the reason why we use mid+1 and mid-1).

like image 32
Wilfredo Avatar answered Feb 14 '26 23:02

Wilfredo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!