Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find leftest duplicate of binary search result

Say I have an ordered array with lots of duplicates:

var array = [ 1, 1, 1, 1, 1,
              2, 2, 2, 2, 2,
              3, 3, 3, 3, 3,
              4, 4, 4, 4, 4,
              5, 5, 5, 5, 5, ];

I also have code to perform a binary search for the index of the closest value within a sorted array:

function binaryClosestIndexOf(array, value) {
  var mid,
    lo = 0,
    hi = array.length - 1;

  while (hi - lo > 1) {
    mid = (lo + hi) >>> 1;

    if (array[mid] > value)
      hi = mid;
    else
      lo = mid;
  }

  if (value - array[lo] <= array[hi] - value)
    return lo;
  else 
    return hi;
}

Performing a few example searches unveils my issue:

binaryClosestIndexOf(array, 3.5);
> 14 // array[14] = 3
binaryClosestIndexOf(array, 3.50001);
> 15 // array[15] = 4
binaryClosestIndexOf(array, 3.9);
> 15 // array[15] = 4
binaryClosestIndexOf(array, 4);
> 19 // array[19] = 4
binaryClosestIndexOf(array, 4.49999);
> 19 // array[19] = 4

As we can see, there is no issue with the algorithm, it does return the closest value. But it returns an interesting mixture of indices, from the leftest to the rightest.

I want to get the leftest duplicate index. I could introduce an O(n) search after the binary search, iterating through each value in the array after until a value is found that is smaller than the current value. I don't want to do this.

Is there a way to elegantly perform a binary search that will end up with the leftest duplicate value? Bonus points for an algorithm for the rightest value, too!

like image 239
Nick Bull Avatar asked Nov 25 '25 22:11

Nick Bull


1 Answers

Being a binary search, if you search for an exact value, you are not promised any location (rightest or leftest), it could be in the middle.

Since binary search works by having a sorted list and reducing by factors of two finding an edge index could be difficult.

I can think of two approaches

  1. use a loop afterwards, I think you could make that to be expected O(log(n)) using randomness as you could say the final loop would be expected constant time O(1).
  2. Use a second binary search (once you know the value) for the index closest to that number minus 0.000001 (in your list 4 cases this would always result in the second run searching for 3.99999, which would yield 15. Note: You should check in case the number (3.999999) was in the list and move right one place to get your value unless you can ensure a certain degree of rounding in the list. This would be 2*log(n) or O(log(n)).

If your list is long, I think the expected run time for option 2 would actually be longer than option 1 because 2*log(n) would be > log(n) + a constant unless you know there will be lots of duplicates.

like image 196
Loren Avatar answered Nov 27 '25 11:11

Loren



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!