Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding multiple entries with binary search

I use standard binary search to quickly return a single object in a sorted list (with respect to a sortable property).

Now I need to modify the search so that ALL matching list entries are returned. How should I best do this?

like image 606
Gruber Avatar asked Aug 27 '12 15:08

Gruber


People also ask

Can binary search find multiple values?

As Tim Roberts said that the BinarySearch is used to search the entire sorted List<T> for an element using the default comparer and returns the zero-based index of the element. So you can't get multiple values.

Can you use binary search on a list with duplicates?

Short answer: you don't, it is not its purpose. A binary search only gives you the position of the value you want, or the position of 1 of them if duplicated. To display all duplicates and indexes, you need to do a secondary search around the position returned by binary search routine.

Can you binary search a list?

Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one. We used binary search in the guessing game in the introductory tutorial.

How many searches does binary search take?

Once the reasonable portion contains just one element, no further guesses occur; the guess for the 1-element portion is either correct or incorrect, and we're done. So with an array of length 8, binary search needs at most four guesses.


2 Answers

Well, as the list is sorted, all the entries you are interested in are contiguous. This means you need to find the first item equal to the found item, looking backwards from the index which was produced by the binary search. And the same about last item.

You can simply go backwards from the found index, but this way the solution may be as slow as O(n) if there are a lot of items equal to the found one. So you should better use exponential search: double your jumps as you find more equal items. This way your whole search is still O(log n).

like image 158
Vlad Avatar answered Sep 23 '22 17:09

Vlad


First let's recall the naive binary search code snippet:

int bin_search(int arr[], int key, int low, int high) {     if (low > high)         return -1;      int mid = low + ((high - low) >> 1);      if (arr[mid] == key) return mid;     if (arr[mid] > key)         return bin_search(arr, key, low, mid - 1);     else         return bin_search(arr, key, mid + 1, high); } 

Quoted from Prof.Skiena: Suppose we delete the equality test if (s[middle] == key) return(middle); from the implementation above and return the index low instead of −1 on each unsuccessful search. All searches will now be unsuccessful, since there is no equality test. The search will proceed to the right half whenever the key is compared to an identical array element, eventually terminating at the right boundary. Repeating the search after reversing the direction of the binary comparison will lead us to the left boundary. Each search takes O(lgn) time, so we can count the occurrences in logarithmic time regardless of the size of the block.

So, we need two rounds of binary_search to find the lower_bound (find the first number no less than the KEY) and the upper_bound (find the first number bigger than the KEY).

int lower_bound(int arr[], int key, int low, int high) {     if (low > high)         //return -1;         return low;      int mid = low + ((high - low) >> 1);     //if (arr[mid] == key) return mid;      //Attention here, we go left for lower_bound when meeting equal values     if (arr[mid] >= key)          return lower_bound(arr, key, low, mid - 1);     else         return lower_bound(arr, key, mid + 1, high); }  int upper_bound(int arr[], int key, int low, int high) {     if (low > high)         //return -1;         return low;      int mid = low + ((high - low) >> 1);     //if (arr[mid] == key) return mid;      //Attention here, we go right for upper_bound when meeting equal values     if (arr[mid] > key)          return upper_bound(arr, key, low, mid - 1);     else         return upper_bound(arr, key, mid + 1, high); } 

Hope it's helpful :)

like image 45
user2696499 Avatar answered Sep 20 '22 17:09

user2696499