Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to determine indices i..j of array A containing all the elements of another array B

Tags:

algorithm

I came across this question on an interview questions thread. Here is the question:

Given two integer arrays A [1..n] and B[1..m], find the smallest window in A that contains all elements of B. In other words, find a pair < i , j > such that A[i..j] contains B[1..m].

If A doesn't contain all the elements of B, then i,j can be returned as -1. The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.

Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).

Now I can think of two variations.

Let's suppose that B has duplicates.

  1. This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.

  2. This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.

When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.

Thanks in advance.

like image 900
Skylark Avatar asked May 29 '09 13:05

Skylark


People also ask

How do you check if an array is present in another array?

Simple Approach: A simple approach is to run two nested loops and generate all subarrays of the array A[] and use one more loop to check if any of the subarray of A[] is equal to the array B[]. Efficient Approach : An efficient approach is to use two pointers to traverse both the array simultaneously.

How do you check if an array contains a value from another array Python?

Use numpy. isin() to return a boolean array of the same shape as Y that is True where an element of Y is in X and False otherwise. Use numpy. in1d() to test whether each element of a 1-D array X is also present in a second array Y.

How do you find all pairs of elements in an array whose sum is equal to a given number?

Follow the steps below to solve the given problem: Initialize the count variable with 0 which stores the result. Iterate arr and if the sum of ith and jth [i + 1…..n – 1] element is equal to sum i.e. arr[i] + arr[j] == sum, then increment the count variable. Return the count.


3 Answers

Complexity

Time: O((m+n)log m)

Space: O(m)

The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

like image 193
ShreevatsaR Avatar answered Oct 20 '22 06:10

ShreevatsaR


Here is the solution I thought of (but it's not very neat).

I am going to illustrate it using the example in the question.

Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]

  1. Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).

  2. Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,

A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).

C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]

Time: (n log m), Space O(n).

  1. Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.

Please leave comments regarding the general approach as such, as well as for 3a and 3b.

like image 24
Skylark Avatar answered Oct 20 '22 04:10

Skylark


My solution:

a. Create a hash table with m keys, one for each value in B. Each key in H maps to a dynamic array of sorted indices containing indices in A that are equal to B[i]. This takes O(n) time. We go through each index j in A. If key A[i] exists in H (O(1) time) then add an value containing the index j of A to the list of indices that H[A[i]] maps to.

At this point we have 'binned' n elements into m bins. However, total storage is just O(n).

b. The 2nd part of the algorithm involves maintaining a ‘left’ index and a ‘right’ index for each list in H. Lets create two arrays of size m called L and R that contain these values. Initially in our example,

We also keep track of the “best” minimum window.

We then iterate over the following actions on L and R which are inherently greedy: i. In each iteration, we compute the minimum and maximum values in L and R. For L, Lmax - Lmin is the window and for R, Rmax - Rmin is the window. We update the best window if one of these windows is better than the current best window. We use a min heap to keep track of the minimum element in L and a max heap to keep track of the largest element in R. These take O(m*log(m)) time to build. ii. From a ‘greedy’ perspective, we want to take the action that will minimize the window size in each L and R. For L it intuitively makes sense to increment the minimum index, and for R, it makes sense to decrement the maximum index.

We want to increment the array position for the minimum value until it is larger than the 2nd smallest element in L, and similarly, we want to decrement the array position for the largest value in R until it is smaller than the 2nd largest element in R.

Next, we make a key observation:

If L[i] is the minimum value in L and R[i] is less than the 2nd smallest element in L, ie, if R[i] were to still be the minimum value in L if L[i] were replaced with R[i], then we are done. We now have the “best” index in list i that can contribute to the minimum window. Also, all the other elements in R cannot contribute to the best window since their L values are all larger than L[i]. Similarly if R[j] is the maximum element in R and L[j] is greater than the 2nd largest value in R, we are also done by setting R[j] = L[j]. Any other index in array i to the left of L[j] has already been accounted for as have all indices to the right of R[j], and all indices between L[j] and R[j] will perform poorer than L[j].

Otherwise, we simply increment the array position L[i] until it is larger than the 2nd smallest element in L and decrement array position R[j] (where R[j] is the max in R) until it is smaller than the 2nd largest element in R. We compute the windows and update the best window if one of the L or R windows is smaller than the best window. We can do a Fibonacci search to optimally do the increment / decrement. We keep incrementing L[i] using Fibonacci increments until we are larger than the 2nd largest element in L. We can then perform binary search to get the smallest element L[i] that is larger than the 2nd largest element in L, similar for the set R. After the increment / decrement, we pop the largest element from the max heap for R and the minimum element for the min heap for L and insert the new values of L[i] and R[j] into the heaps. This is an O(log(m)) operation.

Step ii. would terminate when Lmin can’t move any more to the right or Rmax can’t move any more to the left (as the R/L values are the same). Note that we can have scenarios in which L[i] = R[i] but if it is not the minimum element in L or the maximum element in R, the algorithm would still continue.

Runtime analysis: a. Creation of the hash table takes O(n) time and O(n) space. b. Creation of heaps: O(m*log(m)) time and O(m) space. c. The greedy iterative algorithm is a little harder to analyze. Its runtime is really bounded by the distribution of elements. Worst case, we cover all the elements in each array in the hash table. For each element, we perform an O(log(m)) heap update.

Worst case runtime is hence O(n*log(m)) for the iterative greedy algorithm. In the best case, we discover very fast that L[i] = R[i] for the minimum element in L or the maximum element in R…run time is O(1)*log(m) for the greedy algorithm!

Average case seems really hard to analyze. What is the average “convergence” of this algorithm to the minimum window. If we were to assume that the Fibonacci increments / binary search were to help, we could say we only look at m*log(n/m) elements (every list has n/m elements) in the average case. In that case, the running time of the greedy algorithm would be m*log(n/m)*log(m).

Total running time Best case: O(n + m*log(m) + log(m)) time = O(n) assuming m << n Average case: O(n + m*log(m) + m*log(n/m)*log(m)) time = O(n) assuming m << n. Worst case: O(n + n*log(m) + m*log(m)) = O(n*log(m)) assuming m << n.

Space: O(n + m) (hashtable and heaps) always.

Edit: Here is a worked out example:

A[5, 1, 1, 5, 6, 1, 1, 5] B[5, 6]

H: { 5 => {1, 4, 8} 6 => {5} }

Greedy Algorithm:

L => {1, 1} R => {3, 1}

Iteration 1: a. Lmin = 1 (since H{5}[1] < H{6}[1]), Lmax = 5. Window: 5 - 1 + 1= 5 Increment Lmin pointer, it now becomes 2.

L => {2, 1}

Rmin = H{6}[1] = 5, Rmax = H{5}[3] = 8. Window = 8 - 5 + 1 = 4. Best window so far = 4 (less than 5 computed above). We also note the indices in A (5, 8) for the best window.

Decrement Rmax, it now becomes 2 and the value is 4.

R => {2, 1}

b. Now, Lmin = 4 (H{5}[2]) and the index i in L is 1. Lmax = 5 (H{6}[1]) and the index in L is 2. We can't increment Lmin since L[1] = R[1] = 2. Thus we just compute the window now.

The window = Lmax - Lmin + 1 = 2 which is the best window so far.

Thus, the best window in A = (4, 5).

like image 1
Arnab Avatar answered Oct 20 '22 05:10

Arnab