Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary Search using start < end vs. using start <= end

In binary search, we usually have low and high variables and typically there is a while loop that tests if low <= high, as shown in this code (from Wikipedia):

int SortedArray[max] = {....}

int BinarySearch (int key)
{
    int start = 0;
    int end = max - 1;
    int mid;
    while (start <= end)
    {
        mid = (start + end) / 2;
        if (key == a[mid])
            return mid;
        else if (key < a[mid])
            end = mid - 1;
        else start = mid + 1;
    }
    return -1;
}

When learning binary search, I was always taught the start <= end approach, but when seeing other implementations, I've seen a lot of people do while(start < end).

Is there an advantage to one versus the other? In my own native implementations, I do the <= approach but when I switch it out for <, the search fails.

Is there a rule of thumb for using one versus the other?

like image 933
javanewbie Avatar asked May 28 '17 19:05

javanewbie


People also ask

How does a binary search end?

It should be terminated when the search interval is empty, which means that if you don't have to find it, it means you haven't found it.

How do you avoid infinite loops in binary search?

To ensure binary search termination, make sure the mid (and so left or right) change on every iteration. If we can reason or determine that mid value will always change, we can be 100% confidence that the binary search will not be struck in the infinite loop.

Which of the following is terminating condition of failure for binary search?

Running loop with high initialized with Length.


2 Answers

even if your question is probably not super clear, I could infer you are talking about this kind of implementation of the binary search (here in C, from Wikipedia):

int SortedArray[max] = {....}

int BinarySearch (int key)
{
    int start = 0;
    int end = max - 1;
    int mid;
    while (start <= end)
    {
        mid = (start + end) / 2;
        if (key == a[mid])
            return mid;
        else if (key < a[mid])
            end = mid - 1;
        else start = mid + 1;
    }
    return -1;
}

If you replace start <= end by start < end, there will be cases where your algorithm will not give the good answer.

Let's think about two cases.

1 - You would like to search 1 in the list [1]. In that case, start = 0, end = 0 and the algorithm would return -1 if you change the loop condition.

2 - You would like to search 2 in the list [1, 2]. In that case, start = 0, end = 1. The algorithm will set mid = (0+1)/2=0 in C. Thus arr[mid] < key. It will make start = 1, end = 1. Again, if you stop the loop here, the algorithm will return -1 instead of 1.

And there are probably many other examples.

Have a nice day

like image 172
Alexis Clarembeau Avatar answered Sep 24 '22 01:09

Alexis Clarembeau


For low <= high, high is considered inclusive (high is part of the range we consider).

For low < high, high is considered exclusive (high is not part of the range we consider).

Both can be correct, but there will be minor differences in the rest of the code, specifically how high is initialised (high = length-1 versus high = length) and how it's updated (high = mid-1 versus high = mid).


Which one is better?

The main difference is that mid = (low + high) / 2 will be slightly different for each case.

More specifically, high will be 1 bigger in the exclusive case, thus, when high-low is even in the inclusive case, mid will stay the same, but when high-low is odd in the inclusive case, mid will be 1 element bigger in the exclusive case (this is because of rounding).

Let's consider an example:

length = 6
low = 0
highInclusive = 5, highExclusive = 6
midInclusive = 5/2 = 2, midExclusive = 6/2 = 3

As you can see, when there is no single middle element, one will pick the element to the left and the other will pick the element to the right.

While this will sometimes make the one faster and sometimes make the other faster, the average running time will be pretty much identical.

From a readability perspective, it might be slightly better (in my opinion) to use the exclusive one in languages with 0-based arrays and either one in languages with 1-based arrays, in order to minimise the number of -1's in the code. An argument could also be made to just stick to a single version in all languages, as to not require that people understand both versions or get confused between the two.

like image 43
Bernhard Barker Avatar answered Sep 24 '22 01:09

Bernhard Barker