Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find intersection of two sorted arrays which in some cases require less than O(m+n) comparisons

Here is one way of doing this in O(m+n) where m and n are lengths of two arrays:

import random

def comm_seq(arr_1, arr_2):
    if len(arr_1) == 0 or len(arr_2) == 0:
        return []

    m = len(arr_1) - 1
    n = len(arr_2) - 1

    if arr_1[m] == arr_2[n]:
        return comm_seq(arr_1[:-1], arr_2[:-1]) + [arr_1[m]]

    elif arr_1[m] < arr_2[n]:
        return comm_seq(arr_1, arr_2[:-1])

    elif arr_1[m] > arr_2[n]:
        return comm_seq(arr_1[:-1], arr_2)


if __name__ == "__main__":
    arr_1 = [random.randrange(0,5) for _ in xrange(10)]
    arr_2 = [random.randrange(0,5) for _ in xrange(10)]
    arr_1.sort()
    arr_2.sort()
    print comm_seq(arr_1, arr_2)

Is there a technique that in some cases uses less than O(m+n) comparisons? For example: arr_1=[1,2,2,2,2,2,2,2,2,2,2,100] and arr_2=[1,3,100]

(Not looking for the hash table implementation)

like image 393
ajmartin Avatar asked Nov 22 '12 03:11

ajmartin


2 Answers

A binary search algorithm requires O(logm) time to find a number in an array with length m. Therefore, if we search each number of an array with length n from an array with length m, its overall time complexity is O(nlogm). If m is much greater than n, O(nlogm) is actually less than O(m+n). Therefore, we can implement a new and better solution based on binary search in such a situation. source

However, this does not necessarily means binary search is better in than O(m+n) case. In fact, binary search approach is only better when n << m (n is very small compared to m).

like image 86
dreamcrash Avatar answered Nov 14 '22 22:11

dreamcrash


As far as I know, there are a few different ways to solve this problem, but none of them are better than O(m + n). I don't know how you can have an algorithm faster than that (barring weird quantum computing answers), because you have to compare all the elements in both arrays or you might miss a duplicate.

Brute Force Use two nested for loops. Take every element from the first array and linear search it in the second array. O(M*N) time, O(1) space

Map Lookup Use a lookup structure like a hashtable or a binary search tree. Put all of the first array into the map structure, then loop through all of the second array and look up each element in the map to see if it exists. This works whether the arrays are sorted or not. O(M*log(M) + N*log(M)) for Binary Search Tree time or O(M + N) time for Hashtable, both are O(M) space.

Binary Search Like brute force, but take every element from the first array and binary search it in the second array. O(m*log(N)) time, O(1) space

Parallel Walk Like the merge part of merge sort. Have two pointers start at the front of each of the arrays. Compare the two elements, if they're equal store the duplicate, otherwise advance the pointer to the smaller value by one spot and repeat until you hit the end of one of the arrays. O(M + N) time, O(1) space

Regardless, you must examine every element in both arrays or you won't know if you've found all the duplicates. You could argue fringe cases where one array is a lot bigger or a lot smaller, but that won't hold for an alogrithm where you're considering all ranges of input.

like image 20
Brent Writes Code Avatar answered Nov 14 '22 23:11

Brent Writes Code