I have an interesting problem that given two sorted arrays:
a with n elements , b with n-1 elements.
b has all the elements of a except one element is missing.
How to find that element in O(log n) time?
I have tried this code:
def lostElements2(a, b):
if len(a)<len(b):
a, b = b, a
l, r = 0, len(a)-1
while l<r:
m = l + (r-l)//2
if a[m]==b[m]:
l = m+1
else:
r = m - 1
return a[r]
print(lostElements2([-1,0,4,5,7,9], [-1,0,4,5,9]))
I am not getting what should I return in the function, should it be a[l], a[r]?
I am getting how the logic inside the function should be: if the mid values of both arrays match, it means, b till the mid point is the same as a, and hence the missing element must be on the right of mid.
But am not able to create a final solution, when should the loop stop and what should be returned? How will it guarantee that a[l] or a[r] is indeed the missing element?
The point of l
and r
should be that l
is always a position where the lists are equal, while r
is always a position where they differ. Ie.
a[l]==b[l]
and a[r]!=b[r]
The only mistake in the code is to update r
to m-1
instead of m
. If we know that a[m]!=b[m]
, we can safely set r=m
. But setting it to m-1
risks getting a[r]==b[r]
, which breaks the algorithm.
def lostElements2(a, b):
if len(a) < len(b):
a, b = b, a
if a[0] != b[0]:
return a[0]
l, r = 0, len(a)-1
while l < r:
m = l + (r-l)//2
if a[m] == b[m]:
l = m+1
else:
r = m # The only change
return a[r]
(As @btilly points out, this algorithm fails if we allow for repeated values.)
edit from @btilly
To fix that potential flaw, if the values are equal, we search for the range with the same value. To do that we walk forward in steps of size 1, 2, 4, 8 and so on until the value switches, then do a binary search. And walk backwards the same. Now look for a difference at each edge.
The effort required for that search is O(log(k))
where k
is the length of the repeated value. So we are now replacing O(log(n))
lookups with searches. If there is an upper bound K
on the length of that search, that makes the overall running time. O(log(n)log(K))
. That makes the worst case running time O(log(n)^2)
. If K
is close to sqrt(n)
, it is easy to actually hit that worst case.
I claimed in a comment that if at most K
elements are repeated more than K
times then the running time is O(log(n)log(K))
. On further analysis, that claim is wrong. If K = log(n)
and the log(n)
runs of length sqrt(n)
are placed to hit all the choices of the search, then you get running time O(log(n)^2)
and not O(log(n)log(log(n)))
.
However if at most log(K)
elements are repeated more than K
times, then you DO get a running time of O(log(n)log(K))
. Which should be good enough for most cases. :-)
The principle of this problem is simple, the details are hard.
You have arranged that array a
is the longer one. Good, that simplifies life. Now you need to return the value of a
at the first position where the value of a
differs from the value of b
.
Now you need to be sure to deal with the following edge cases.
a
has a value.a = [1, 1, 2, 2, 2, 2, 3]
while b = [1, 2, 2, 2, 2, 3]
- when you land in the middle the fact that the values match can mislead you!Good luck!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With