Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find common elements in two sorted arrays [duplicate]

Possible Duplicate:
The intersection of two sorted arrays

We have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?

like image 450
user1686630 Avatar asked Oct 20 '12 23:10

user1686630


3 Answers

Hold two pointers: one for each array.

i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
    if arr1[i] > arr2[j]: increase j
    else if arr1[i] < arr2[j]: increase i
    else : output arr[i], increase both pointers

The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.

This solution requires a single traversal on the data. O(n) (with good constants as well).

like image 143
amit Avatar answered Oct 04 '22 19:10

amit


If the lengths of two arrays (say, A has N elements and B has M elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M).

If the lengths are significantly different (say, M << N), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N) in that case.

As you can see O(M * log N) is better than O(N + M) if M is much smaller than N, and worse otherwise.

The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.

These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N. In that case let's choose step value S = [N / M]. You take first element from array A and perform a straddled linear search for that element in array B with step S, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ... and so on. Once you find the index range [S*i, S*(i+1)] that potentially contains the element you are searching for, you switch to binary search inside that segment of array B. Done. The straddled linear search for the next element of A begins where the previous search left off. (As a side note, it might make sense to choose the value of S equal to a power of 2).

This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.

like image 39
AnT Avatar answered Oct 04 '22 19:10

AnT


besides compare one with all the elements in other array

You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.

If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.

This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]

like image 29
Ryan Tennill Avatar answered Oct 04 '22 19:10

Ryan Tennill