Binary search Python Why do we use mid + 1 or mid - 1

Question

I am learning binary search and the sample code uses" low = mid +1 and high = mid -1" but I don't understand why we do not use "low = mid and high = mid " instead?

def binarysearch(sequence, value):
    lo, hi = 0, len(sequence) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if sequence[mid] < value:
            lo = mid + 1
        elif value < sequence[mid]:
            hi = mid - 1
        else:
            return mid
    return None

my_list = [1, 3, 5, 7, 9]
binarysearch(my_list, 3)

Reblochon Masque · Accepted Answer

The reason is to avoid overlapping searches; it places the boundary of the search ranges on the items that have not been checked yet.

For instance: if mid is at index 10, the next search left will look at values up to index 9, and the right search from values at index 11.

                    mid
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|                 |    |                           |
<-from 0 to mid-1->    <-- from mid+1 to the end --> 

note that in this example, the boundary values are included

Wilfredo · Answer

Please let me try to explain why the binary search works. Let's say we've the following sequence A=[-5, 10, 14, 33, 42, 42, 42] and searched_value = 14 we would have:

Iteration 1,     lo = 0, hi = 6, mid = 3     A[mid] > 14
Iteration 2,     lo = 0, hi = 2, mid = 1     A[mid] < 14
Iteration 3,     lo = 2, hi = 2, mid = 2     A[mid] = 14  (found!)

At each iteration of the algorithm we could conclude that lo and hi always contains the position of the searched value, and we'll prove it by induction on the number of iterations:

Inductive Hypothesis: if the searched value is present in the sequence it will always be contained between lo and hi.

Base Case: At iteration 1 lo = 0 and hi = n-1 contains all elements, in consequence if the searched value is present in the sequence it'll be contained between lo and hi, and the invariant is trivially correct.

Inductive Step: At any step if the searched value is contained between lo and hi, it'll continue to be contained between lo and hi at the next iteration. We've 3 possibilities (and here is the answer to the question):

If A[mid] = searched_value: In this case the algorithm finishes reporting correctly the position of the searched value in the sequence and the invariant is correct because the searched value was between lo and hi.
If A[mid] < searched_value: Knowing that it's a sorted sequence, all the elements between A[lo...mid] < searched_value (including A[mid]), thus we can assign lo=mid+1 (safely searching only in the upper half) and the invariant will still be correct at the next iteration.
If A[mid] > searched_value: knowing that it's a sorted sequence, all the elements between A[mid...hi] > searched value (including A[mid]), thus we can assing hi=mid-1 (safely searching only in the lower half) and the invariant will still be correct at the next iteration.

Given that, at each iteration, the algorithm always searches on smaller segments of the sequence, the termination condition is guaranteed because either there's only 1 element that is searched_value or it is different and, at the next iteration, the algorithm is going to report that such element isn't present in the sequence.

In consequence the algorithm is proved correct (and the reason why we use mid+1 and mid-1).

Binary search Python Why do we use mid + 1 or mid - 1

Tags:

python

data-structures

binary-search

Mila

2 Answers

Reblochon Masque

Wilfredo

Recent Activity

Donate For Us

Binary search Python Why do we use mid + 1 or mid - 1

Tags:

python

data-structures

binary-search

Mila

2 Answers

Reblochon Masque

Wilfredo

Related questions

Recent Activity

Donate For Us