I have two sorted lists, e.g.
a = [1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]
I want to know for each item in a
if it is in b
. For the above example, I want to find
a_in_b = [True, True, False, False]
(or having the indices where a_in_b
is True
would be fine too).
Now, both a
and b
are very large, so complexity is an issue. If M = len(a)
and N = len(b)
. How can I do this with a complexity lower than M * O(N)
by making use of the fact that both lists are sorted?
The complexity is O(m log n). There are m iterations of the loop. Each insertion into a sorted array is an O(log n) operation. Therefore the overall complexity is O (m log n).
Write a SortedMerge() function that takes two lists, each of which is sorted in increasing order, and merges the two together into one list which is in increasing order. SortedMerge() should return the new list.
The idea is to use Merge function of Merge sort.
print(arr2[x] + " "); The sorted arrays are merged into a single array using a while loop. After the while loop, if any elements are left in arr1 or arr2, then they are added to the merged array.
Time Complexities of all Sorting Algorithms. Efficiency of an algorithm depends on two parameters: 1. Time Complexity. 2. Space Complexity. Time Complexity: Time Complexity is defined as the number of times a particular instruction set is executed rather than the total time is taken.
Sort both the arrays A and B, this will take O (nlogn) time complexity. 2. Take two pointers i and j, initialize both of them to 0. we will use i for array A and j for B. 3. Create a result array res of size n.
The second-fastest sorting algorithm is the one in the good enough sort library (perhaps the one in your programming language’s standard library) that you didn’t have to write. if we use comparison based sorting then best time complexity that we can achieve is O (nlogn).
Quick sort is the fastest sorting algorithm. merge sort requires O(N) extra space. Allocating and de-allocating the extra space used for merge sort increases the running time of the al...
You can iterate over your b
list manually within a loop over a
. You'll want to advance the b
iteration when the latest value you've seen from it is less than the current value from a
.
from math import inf
result = []
b_iter = iter(b) # create an iterator over b
b_val = -inf
for a_val in a:
while b_val < a_val:
b_val = next(b_iter, inf) # manually iterate on it
result.append(a_val == b_val)
This should have a running time of O(M+N)
, since each list item gets iterated over at most once (b
may not even need to be fully iterated).
You could avoid using floating point infinities if you want to, but you'd need to do a bit of extra work to handle some edge cases (e.g. if b
is empty).
Exploiting sorted'ness is a red-herring for time complexity: The ideal case is to iterate both in lockstep for O(n+m) complexity. This is the same as converting b
to a set for O(m), then searching the elements of a
in the set for O(n).
>>> a = [1, 4, 7, 8]
>>> b = [1, 2, 3, 4, 5, 6]
>>> bs = set(b) # create set for O(len(b))
>>> [item in bs for item in a] # check O(len(a)) items "in set of b" for O(1) each
[True, True, False, False]
Since most of these operations are builtin, the only costly operation is the iteration over a
which is needed in all solutions.
However, this will duplicate the references to the items in b
. If b
is treated as external to the algorithm, the space complexity is O(m+n) instead of the ideal case O(n) for just the answer.
Late answer, but a different approach to the problem using set()
uniqueness and O(1)
speed of len()
, i. e. :
a_in_b = []
a = [1,4,7,8]
b = [1,2,3,4,5,6]
b_set = set(b)
for v in a:
l1 = len(b_set)
b_set.add(v)
a_in_b.append(l1 == len(b_set))
Unfortunately, my approach isn't the fastest:
Benchmark
Use Binary Search here:
def bs(b,aele,start,end):
if start > end:
return False
mid = (start + end) // 2
if ale == b[mid]:
return True
if ale < b[mid]:
return bs(b, aele, start, mid-1)
else:
return bs(b, aele, mid+1, end)
For each element in a check if it exists in b using this method. Time Complexity: O(m*log(n))
Using sets the order doesn't even matter.
Turn b
to a set (O(N)
). Then iterate a
(O(M)
), and for each element check if it's in set_b
(O(1)
). This will give a time complexity of O(max(M, N))
:
a = [1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]
set_b = set(b)
res = []
for elem in a:
res.append(elem in set_b)
This can of-course be shortened to a nifty list-comp:
res = [elem in set_b for elem in a]
Both give:
[True, True, False, False]
For your parenthesized request, simply iterate with enumerate
instead:
for i, elem in enumerate(a):
if elem in set_b:
res.append(i)
Which will give [0, 1]
.
You should use binary search algorithm(read about it if you don't know what it is).
The modified bin_search
function has to return position right
for which b[right] >= elem
- the first element in b
that is greater or equal than searched element from a
. This position will be used as the left position for next bin_search
call. Also bin_search
returns True as a second argument if it have found elem
in b
def bin_search(arr, elem, left):
right = len(arr)
while left < right:
mid = (left+right)//2
if arr[mid] == elem:
return (mid, True)
if arr[mid] < elem:
left = mid + 1
else:
right = mid
return (right, False)
def find_a_in_b(a, b):
new_left = 0
a_in_b = [False] * len(a)
# we could have used enumerate but size of a is too large
index = 0
for i in a:
new_left, a_in_b[index] = bin_search(b, i, new_left)
index += 1
return a_in_b
It's probably the best time
P.S. Forget it, i'm stupid and forgot about linear algorithm used in merge sort, so it's not the best
Go through a
and b
once:
a_in_b = []
bstart = 0
for ai in a:
print (ai,bstart)
if bstart == len(b):
a_in_b.append(False)
else:
for bi in b[bstart:]:
print (ai, bi, bstart)
if ai == bi:
a_in_b.append(True)
break
elif ai > bi:
if bstart < len(b):
bstart+=1
if bstart == len(b):
a_in_b.append(False)
continue
The obvious solution is actually O(M + N)
:
a = [1, 1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]
c = [0] * len(a) # Or use a dict to stash hits ..
j = 0
for i in range(0, len(a)):
while j < len(b) - 1 and b[j] < a[i]:
j += 1
if b[j] == a[i]:
c[i] = 1
print(c)
For each i
in 0 ... N
where N
is length of a
, only a unique partition / sub-sequence of b
plus one border number is checked, making it O(M + N)
all together.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With