Possible Duplicate:
The intersection of two sorted arrays
We have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?
Hold two pointers: one for each array.
i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
if arr1[i] > arr2[j]: increase j
else if arr1[i] < arr2[j]: increase i
else : output arr[i], increase both pointers
The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.
This solution requires a single traversal on the data. O(n)
(with good constants as well).
If the lengths of two arrays (say, A
has N
elements and B
has M
elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M)
.
If the lengths are significantly different (say, M << N
), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N)
in that case.
As you can see O(M * log N)
is better than O(N + M)
if M
is much smaller than N
, and worse otherwise.
The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.
These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N
. In that case let's choose step value S = [N / M]
. You take first element from array A
and perform a straddled linear search for that element in array B
with step S
, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ...
and so on. Once you find the index range [S*i, S*(i+1)]
that potentially contains the element you are searching for, you switch to binary search inside that segment of array B
. Done. The straddled linear search for the next element of A
begins where the previous search left off. (As a side note, it might make sense to choose the value of S
equal to a power of 2).
This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.
besides compare one with all the elements in other array
You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.
If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.
This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With