Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Missing Element in an Array

I have an interesting problem that given two sorted arrays:

a with n elements , b with n-1 elements.

b has all the elements of a except one element is missing.

How to find that element in O(log n) time?

I have tried this code:

def lostElements2(a, b):
    if len(a)<len(b):
        a, b = b, a

    l, r = 0, len(a)-1

    while l<r:
        m = l + (r-l)//2

        if a[m]==b[m]:
            l = m+1
        else:
            r = m - 1

    return a[r]


print(lostElements2([-1,0,4,5,7,9], [-1,0,4,5,9]))  

I am not getting what should I return in the function, should it be a[l], a[r]?

I am getting how the logic inside the function should be: if the mid values of both arrays match, it means, b till the mid point is the same as a, and hence the missing element must be on the right of mid.

But am not able to create a final solution, when should the loop stop and what should be returned? How will it guarantee that a[l] or a[r] is indeed the missing element?

like image 253
mourinho Avatar asked Dec 19 '22 01:12

mourinho


2 Answers

The point of l and r should be that l is always a position where the lists are equal, while r is always a position where they differ. Ie. a[l]==b[l] and a[r]!=b[r]

The only mistake in the code is to update r to m-1 instead of m. If we know that a[m]!=b[m], we can safely set r=m. But setting it to m-1risks getting a[r]==b[r], which breaks the algorithm.

def lostElements2(a, b):
    if len(a) < len(b):
        a, b = b, a
    if a[0] != b[0]:
        return a[0]

    l, r = 0, len(a)-1
    while l < r:
        m = l + (r-l)//2
        if a[m] == b[m]:
            l = m+1
        else:
            r = m # The only change
    return a[r]

(As @btilly points out, this algorithm fails if we allow for repeated values.)

edit from @btilly

To fix that potential flaw, if the values are equal, we search for the range with the same value. To do that we walk forward in steps of size 1, 2, 4, 8 and so on until the value switches, then do a binary search. And walk backwards the same. Now look for a difference at each edge.

The effort required for that search is O(log(k)) where k is the length of the repeated value. So we are now replacing O(log(n)) lookups with searches. If there is an upper bound K on the length of that search, that makes the overall running time. O(log(n)log(K)). That makes the worst case running time O(log(n)^2). If K is close to sqrt(n), it is easy to actually hit that worst case.

I claimed in a comment that if at most K elements are repeated more than K times then the running time is O(log(n)log(K)). On further analysis, that claim is wrong. If K = log(n) and the log(n) runs of length sqrt(n) are placed to hit all the choices of the search, then you get running time O(log(n)^2) and not O(log(n)log(log(n))).

However if at most log(K) elements are repeated more than K times, then you DO get a running time of O(log(n)log(K)). Which should be good enough for most cases. :-)

like image 151
kuppern87 Avatar answered Dec 27 '22 01:12

kuppern87


The principle of this problem is simple, the details are hard.

You have arranged that array a is the longer one. Good, that simplifies life. Now you need to return the value of a at the first position where the value of a differs from the value of b.

Now you need to be sure to deal with the following edge cases.

  1. The differing value is the last (ie at a position where only array a has a value.
  2. The differing value is the very first. (Binary search algorithms are easy to screw up for this case.
  3. There is a run the same. That is a = [1, 1, 2, 2, 2, 2, 3] while b = [1, 2, 2, 2, 2, 3] - when you land in the middle the fact that the values match can mislead you!

Good luck!

like image 39
btilly Avatar answered Dec 27 '22 02:12

btilly