Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'in' for two sorted lists with the lowest complexity

Tags:

python

I have two sorted lists, e.g.

a = [1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]

I want to know for each item in a if it is in b. For the above example, I want to find

a_in_b = [True, True, False, False]

(or having the indices where a_in_b is True would be fine too).

Now, both a and b are very large, so complexity is an issue. If M = len(a) and N = len(b). How can I do this with a complexity lower than M * O(N) by making use of the fact that both lists are sorted?

like image 963
Tom de Geus Avatar asked Jan 19 '21 09:01

Tom de Geus


People also ask

What is the time complexity of merging two sorted lists?

The complexity is O(m log n). There are m iterations of the loop. Each insertion into a sorted array is an O(log n) operation. Therefore the overall complexity is O (m log n).

How do I join two sorted lists?

Write a SortedMerge() function that takes two lists, each of which is sorted in increasing order, and merges the two together into one list which is in increasing order. SortedMerge() should return the new list.

What sort of algorithm would you model if you need to combine 2 sorted arrays and create a sorted output?

The idea is to use Merge function of Merge sort.

How do you merge two sorted arrays into a single sorted array?

print(arr2[x] + " "); The sorted arrays are merged into a single array using a while loop. After the while loop, if any elements are left in arr1 or arr2, then they are added to the merged array.

What is the time complexity of sorting algorithms?

Time Complexities of all Sorting Algorithms. Efficiency of an algorithm depends on two parameters: 1. Time Complexity. 2. Space Complexity. Time Complexity: Time Complexity is defined as the number of times a particular instruction set is executed rather than the total time is taken.

How to sort an array using nlogn time complexity?

Sort both the arrays A and B, this will take O (nlogn) time complexity. 2. Take two pointers i and j, initialize both of them to 0. we will use i for array A and j for B. 3. Create a result array res of size n.

What is the second-fastest sorting algorithm?

The second-fastest sorting algorithm is the one in the good enough sort library (perhaps the one in your programming language’s standard library) that you didn’t have to write. if we use comparison based sorting then best time complexity that we can achieve is O (nlogn).

What is the fastest sorting algorithm in Python?

Quick sort is the fastest sorting algorithm. merge sort requires O(N) extra space. Allocating and de-allocating the extra space used for merge sort increases the running time of the al...


Video Answer


8 Answers

You can iterate over your b list manually within a loop over a. You'll want to advance the b iteration when the latest value you've seen from it is less than the current value from a.

from math import inf

result = []
b_iter = iter(b)                           # create an iterator over b
b_val = -inf
for a_val in a:
    while b_val < a_val:
        b_val = next(b_iter, inf)          # manually iterate on it
    result.append(a_val == b_val)

This should have a running time of O(M+N), since each list item gets iterated over at most once (b may not even need to be fully iterated).

You could avoid using floating point infinities if you want to, but you'd need to do a bit of extra work to handle some edge cases (e.g. if b is empty).

like image 152
Blckknght Avatar answered Nov 03 '22 23:11

Blckknght


Exploiting sorted'ness is a red-herring for time complexity: The ideal case is to iterate both in lockstep for O(n+m) complexity. This is the same as converting b to a set for O(m), then searching the elements of a in the set for O(n).

>>> a = [1, 4, 7, 8]
>>> b = [1, 2, 3, 4, 5, 6]
>>> bs = set(b)                 # create set for O(len(b))
>>> [item in bs for item in a]  # check O(len(a)) items "in set of b" for O(1) each
[True, True, False, False]

Since most of these operations are builtin, the only costly operation is the iteration over a which is needed in all solutions.

However, this will duplicate the references to the items in b. If b is treated as external to the algorithm, the space complexity is O(m+n) instead of the ideal case O(n) for just the answer.

like image 36
MisterMiyagi Avatar answered Nov 04 '22 00:11

MisterMiyagi


Late answer, but a different approach to the problem using set() uniqueness and O(1) speed of len(), i. e. :

a_in_b = []
a = [1,4,7,8]
b = [1,2,3,4,5,6]
b_set = set(b) 
for v in a:
    l1 = len(b_set) 
    b_set.add(v) 
    a_in_b.append(l1 == len(b_set)) 

Unfortunately, my approach isn't the fastest:

  • mistermiyagi: 0.387 ms
  • tomerikoo: 0.442 ms
  • blckknght: 0.729 ms
  • lobito: 1.043 ms
  • semisecure: 1.87 ms
  • notnotparas: too long
  • lucky6qi: too long

Benchmark

like image 44
Pedro Lobito Avatar answered Nov 03 '22 23:11

Pedro Lobito


Use Binary Search here:

def bs(b,aele,start,end):
    if start > end:
        return False
    mid = (start + end) // 2
    if ale == b[mid]:
        return True

    if ale < b[mid]:
        return bs(b, aele, start, mid-1)
    else:
        return bs(b, aele, mid+1, end)

For each element in a check if it exists in b using this method. Time Complexity: O(m*log(n))

like image 21
notnotparas Avatar answered Nov 03 '22 23:11

notnotparas


Using sets the order doesn't even matter.

Turn b to a set (O(N)). Then iterate a (O(M)), and for each element check if it's in set_b (O(1)). This will give a time complexity of O(max(M, N)):

a = [1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]

set_b = set(b)
res = []
for elem in a:
    res.append(elem in set_b)

This can of-course be shortened to a nifty list-comp:

res = [elem in set_b for elem in a]

Both give:

[True, True, False, False]

For your parenthesized request, simply iterate with enumerate instead:

for i, elem in enumerate(a):
    if elem in set_b:
        res.append(i)

Which will give [0, 1].

like image 32
Tomerikoo Avatar answered Nov 03 '22 22:11

Tomerikoo


You should use binary search algorithm(read about it if you don't know what it is).

The modified bin_search function has to return position right for which b[right] >= elem - the first element in b that is greater or equal than searched element from a. This position will be used as the left position for next bin_search call. Also bin_search returns True as a second argument if it have found elem in b

def bin_search(arr, elem, left):
    right = len(arr)
    while left < right:
        mid = (left+right)//2
        if arr[mid] == elem:
            return (mid, True)
        if arr[mid] < elem:
            left = mid + 1
        else:
            right = mid
    return (right, False)

def find_a_in_b(a, b):
    new_left = 0
    a_in_b = [False] * len(a)
    
    # we could have used enumerate but size of a is too large
    index = 0
    for i in a:
        new_left, a_in_b[index] = bin_search(b, i, new_left)
        index += 1
    return a_in_b

It's probably the best time

P.S. Forget it, i'm stupid and forgot about linear algorithm used in merge sort, so it's not the best

like image 21
Илья Кузнецов Avatar answered Nov 04 '22 00:11

Илья Кузнецов


Go through a and b once:

a_in_b = []
bstart = 0
for ai in a:
    print (ai,bstart)
    if bstart == len(b):
        a_in_b.append(False)
    else:
        for bi in b[bstart:]:
            print (ai, bi, bstart)
            if ai == bi:
                a_in_b.append(True)
                break
            elif ai > bi:
                if bstart < len(b):
                    bstart+=1
                if bstart == len(b):
                    a_in_b.append(False)
                continue
like image 22
lucky6qi Avatar answered Nov 03 '22 22:11

lucky6qi


The obvious solution is actually O(M + N):

a = [1, 1, 4, 7, 8]
b = [1, 2, 3, 4, 5, 6]
c = [0] * len(a) # Or use a dict to stash hits ..

j = 0

for i in range(0, len(a)):
  while j < len(b) - 1 and b[j] < a[i]:
    j += 1
  if b[j] == a[i]:
    c[i] = 1

print(c)

For each i in 0 ... N where N is length of a, only a unique partition / sub-sequence of b plus one border number is checked, making it O(M + N) all together.

like image 41
spinkus Avatar answered Nov 03 '22 23:11

spinkus