I came across this question on an interview questions thread. Here is the question: <blockquote> Given two integer arrays A [1..n] and B[1..m], find the smallest window in A that contains all elements of B. In other words, find a pair such that A[i..j] contains B[1..m]. If A doesn't contain all the elements of B, then i,j can be returned as -1. The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them. Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A). </blockquote> Now I can think of two variations. Let's suppose that B has duplicates. <ol> <li> This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3. </li> <li> This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1. </li> </ol> When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too. Thanks in advance.

<h3>Complexity</h3> <h3>Time: O((m+n)log m)</h3> <h3>Space: O(m)</h3> The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.) Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in <code>O(m log m)</code> time. In the following, let <code>m</code> denote the number of distinct elements in B. We assume <code>m < n</code>, because otherwise we can just return <code>-1</code>, in constant time. For each index <code>i</code> in A, we will find the smallest index <code>s[i]</code> such that <code>A[i..s[i]]</code> contains <code>B[1..m]</code>, with the right multiplicities. The crucial observation is that <code>s[i]</code> is non-decreasing, and this is what allows us to do it in amortised linear time. Start with <code>i=j=1</code>. We will keep a tuple <code>(c[1], c[2], ... c[m])</code> of the number of times each element of B occurs, in the current window <code>A[i..j]</code>. We will also keep a set <code>S</code> of indices (a subset of <code>1..m</code>) for which the count is "right" (i.e., <code>k</code> for which <code>c[k]=1</code> in variant 1, or <code>c[k] = <the right number></code> in variant 2). So, for <code>i=1</code>, starting with <code>j=1</code>, increment each <code>c[A[j]]</code> (if <code>A[j]</code> was an element of B), check if <code>c[A[j]]</code> is now "right", and add or remove <code>j</code> from <code>S</code> accordingly. Stop when <code>S</code> has size <code>m</code>. You've now found <code>s[1]</code>, in at most <code>O(n log m)</code> time. (There are <code>O(n)</code> <code>j</code>'s, and each set operation took <code>O(log m)</code> time.) Now for computing successive <code>s[i]</code>s, do the following. Increment <code>i</code>, decrement <code>c[A[i]]</code>, update <code>S</code> accordingly, and, if necessary, increment <code>j</code> until <code>S</code> has size <code>m</code> again. That gives you <code>s[i]</code> for each <code>i</code>. At the end, report the <code>(i,s[i])</code> for which <code>s[i]-i</code> was smallest. Note that although it seems that you might be performing up to <code>O(n)</code> steps (incrementing <code>j</code>) for each <code>i</code>, the second pointer <code>j</code> only moves to the right: so the total number of times you can increment <code>j</code> is at most <code>n</code>. (This is amortised analysis.) Each time you increment <code>j</code>, you might perform a set operation that takes <code>O(log m)</code> time, so the total time is <code>O(n log m)</code>. The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so <code>O(m)</code> in all. There is an obvious <code>O(m+n)</code> lower bound, because you need to examine all the elements. So the only question is whether we can prove the <code>log</code> factor is necessary; I believe it is.

Here is the solution I thought of (but it's not very neat). I am going to illustrate it using the example in the question. Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17] <ol> <li>Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).</li> <li>Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step, </li> </ol> A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease). C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3] Time: (n log m), Space O(n). <ol start="3"> <li>Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.</li> </ol> Please leave comments regarding the general approach as such, as well as for 3a and 3b.

Algorithm to determine indices i..j of array A containing all the elements of another array B

Tags:

algorithm

I came across this question on an interview questions thread. Here is the question:

Given two integer arrays A [1..n] and B[1..m], find the smallest window in A that contains all elements of B. In other words, find a pair such that A[i..j] contains B[1..m].

If A doesn't contain all the elements of B, then i,j can be returned as -1. The integers in A need not be in the same order as they are in B. If there are more than one smallest window (different, but have the same size), then its enough to return one of them.

Example: A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]. The algorithm should return i = 2 (index of 5 in A) and j = 9 (index of 17 in A).

Now I can think of two variations.

Let's suppose that B has duplicates.

This variation doesn't consider the number of times each element occurs in B. It just checks for all the unique elements that occur in B and finds the smallest corresponding window in A that satisfies the above problem. For example, if A[1,2,4,5,7] and B[2,2,5], this variation doesn't bother about there being two 2's in B and just checks A for the unique integers in B namely 2 and 5 and hence returns i=1, j=3.
This variation accounts for duplicates in B. If there are two 2's in B, then it expects to see at least two 2's in A as well. If not, it returns -1,-1.

When you answer, please do let me know which variation you are answering. Pseudocode should do. Please mention space and time complexity if it is tricky to calculate it. Mention if your solution assumes array indices to start at 1 or 0 too.

Thanks in advance.

900

asked May 29 '09 13:05

Skylark

3 Answers

Complexity

Time: O((m+n)log m)

Space: O(m)

The following is provably optimal up to a logarithmic factor. (I believe the log factor cannot be got rid of, and so it's optimal.)

Variant 1 is just a special case of variant 2 with all the multiplicities being 1, after removing duplicates from B. So it's enough to handle the latter variant; if you want variant 1, just remove duplicates in O(m log m) time. In the following, let m denote the number of distinct elements in B. We assume m < n, because otherwise we can just return -1, in constant time.

For each index i in A, we will find the smallest index s[i] such that A[i..s[i]] contains B[1..m], with the right multiplicities. The crucial observation is that s[i] is non-decreasing, and this is what allows us to do it in amortised linear time.

Start with i=j=1. We will keep a tuple (c[1], c[2], ... c[m]) of the number of times each element of B occurs, in the current window A[i..j]. We will also keep a set S of indices (a subset of 1..m) for which the count is "right" (i.e., k for which c[k]=1 in variant 1, or c[k] = <the right number> in variant 2).

So, for i=1, starting with j=1, increment each c[A[j]] (if A[j] was an element of B), check if c[A[j]] is now "right", and add or remove j from S accordingly. Stop when S has size m. You've now found s[1], in at most O(n log m) time. (There are O(n) j's, and each set operation took O(log m) time.)

Now for computing successive s[i]s, do the following. Increment i, decrement c[A[i]], update S accordingly, and, if necessary, increment j until S has size m again. That gives you s[i] for each i. At the end, report the (i,s[i]) for which s[i]-i was smallest.

Note that although it seems that you might be performing up to O(n) steps (incrementing j) for each i, the second pointer j only moves to the right: so the total number of times you can increment j is at most n. (This is amortised analysis.) Each time you increment j, you might perform a set operation that takes O(log m) time, so the total time is O(n log m). The space required was for keeping the tuple of counts, the set of elements of B, the set S, and some constant number of other variables, so O(m) in all.

There is an obvious O(m+n) lower bound, because you need to examine all the elements. So the only question is whether we can prove the log factor is necessary; I believe it is.

193

answered Oct 20 '22 06:10

ShreevatsaR

Here is the solution I thought of (but it's not very neat).

I am going to illustrate it using the example in the question.

Let A[1,2,5,11,2,6,8,24,101,17,8] and B[5,2,11,8,17]

Sort B. (So B = [2,5,8,11,17]). This step takes O(log m).
Allocate an array C of size A. Iterate through elements of A, binary search for it in the sorted B, if it is found enter it's "index in sorted B + 1" in C. If its not found, enter -1. After this step,

A = [1 , 2, 5, 11, 2, 6, 8, 24, 101, 17, 8] (no changes, quoting for ease).

C = [-1, 1, 2, 4 , 1, -1, 3, -1, -1, 5, 3]

Time: (n log m), Space O(n).

Find the smallest window in C that has all the numbers from 1 to m. For finding the window, I can think of two general directions: a. A bit oriented approach where in I set the bit corresponding to each position and finally check by some kind of ANDing. b. Create another array D of size m, go through C and when I encounter p in C, increment D[p]. Use this for finding the window.

Please leave comments regarding the general approach as such, as well as for 3a and 3b.

answered Oct 20 '22 04:10

Skylark

My solution:

a. Create a hash table with m keys, one for each value in B. Each key in H maps to a dynamic array of sorted indices containing indices in A that are equal to B[i]. This takes O(n) time. We go through each index j in A. If key A[i] exists in H (O(1) time) then add an value containing the index j of A to the list of indices that H[A[i]] maps to.

At this point we have 'binned' n elements into m bins. However, total storage is just O(n).

b. The 2nd part of the algorithm involves maintaining a ‘left’ index and a ‘right’ index for each list in H. Lets create two arrays of size m called L and R that contain these values. Initially in our example,

We also keep track of the “best” minimum window.

We then iterate over the following actions on L and R which are inherently greedy: i. In each iteration, we compute the minimum and maximum values in L and R. For L, Lmax - Lmin is the window and for R, Rmax - Rmin is the window. We update the best window if one of these windows is better than the current best window. We use a min heap to keep track of the minimum element in L and a max heap to keep track of the largest element in R. These take O(m*log(m)) time to build. ii. From a ‘greedy’ perspective, we want to take the action that will minimize the window size in each L and R. For L it intuitively makes sense to increment the minimum index, and for R, it makes sense to decrement the maximum index.

We want to increment the array position for the minimum value until it is larger than the 2nd smallest element in L, and similarly, we want to decrement the array position for the largest value in R until it is smaller than the 2nd largest element in R.

Next, we make a key observation:

If L[i] is the minimum value in L and R[i] is less than the 2nd smallest element in L, ie, if R[i] were to still be the minimum value in L if L[i] were replaced with R[i], then we are done. We now have the “best” index in list i that can contribute to the minimum window. Also, all the other elements in R cannot contribute to the best window since their L values are all larger than L[i]. Similarly if R[j] is the maximum element in R and L[j] is greater than the 2nd largest value in R, we are also done by setting R[j] = L[j]. Any other index in array i to the left of L[j] has already been accounted for as have all indices to the right of R[j], and all indices between L[j] and R[j] will perform poorer than L[j].

Otherwise, we simply increment the array position L[i] until it is larger than the 2nd smallest element in L and decrement array position R[j] (where R[j] is the max in R) until it is smaller than the 2nd largest element in R. We compute the windows and update the best window if one of the L or R windows is smaller than the best window. We can do a Fibonacci search to optimally do the increment / decrement. We keep incrementing L[i] using Fibonacci increments until we are larger than the 2nd largest element in L. We can then perform binary search to get the smallest element L[i] that is larger than the 2nd largest element in L, similar for the set R. After the increment / decrement, we pop the largest element from the max heap for R and the minimum element for the min heap for L and insert the new values of L[i] and R[j] into the heaps. This is an O(log(m)) operation.

Step ii. would terminate when Lmin can’t move any more to the right or Rmax can’t move any more to the left (as the R/L values are the same). Note that we can have scenarios in which L[i] = R[i] but if it is not the minimum element in L or the maximum element in R, the algorithm would still continue.

Runtime analysis: a. Creation of the hash table takes O(n) time and O(n) space. b. Creation of heaps: O(m*log(m)) time and O(m) space. c. The greedy iterative algorithm is a little harder to analyze. Its runtime is really bounded by the distribution of elements. Worst case, we cover all the elements in each array in the hash table. For each element, we perform an O(log(m)) heap update.

Worst case runtime is hence O(n*log(m)) for the iterative greedy algorithm. In the best case, we discover very fast that L[i] = R[i] for the minimum element in L or the maximum element in R…run time is O(1)*log(m) for the greedy algorithm!

Average case seems really hard to analyze. What is the average “convergence” of this algorithm to the minimum window. If we were to assume that the Fibonacci increments / binary search were to help, we could say we only look at m*log(n/m) elements (every list has n/m elements) in the average case. In that case, the running time of the greedy algorithm would be m*log(n/m)*log(m).

Total running time Best case: O(n + m*log(m) + log(m)) time = O(n) assuming m << n Average case: O(n + m*log(m) + m*log(n/m)*log(m)) time = O(n) assuming m << n. Worst case: O(n + n*log(m) + m*log(m)) = O(n*log(m)) assuming m << n.

Space: O(n + m) (hashtable and heaps) always.

Edit: Here is a worked out example:

A[5, 1, 1, 5, 6, 1, 1, 5] B[5, 6]

H: { 5 => {1, 4, 8} 6 => {5} }

Greedy Algorithm:

L => {1, 1} R => {3, 1}

Iteration 1: a. Lmin = 1 (since H{5}[1] < H{6}[1]), Lmax = 5. Window: 5 - 1 + 1= 5 Increment Lmin pointer, it now becomes 2.

L => {2, 1}

Rmin = H{6}[1] = 5, Rmax = H{5}[3] = 8. Window = 8 - 5 + 1 = 4. Best window so far = 4 (less than 5 computed above). We also note the indices in A (5, 8) for the best window.

Decrement Rmax, it now becomes 2 and the value is 4.

R => {2, 1}

b. Now, Lmin = 4 (H{5}[2]) and the index i in L is 1. Lmax = 5 (H{6}[1]) and the index in L is 2. We can't increment Lmin since L[1] = R[1] = 2. Thus we just compute the window now.

The window = Lmax - Lmin + 1 = 2 which is the best window so far.

Thus, the best window in A = (4, 5).

answered Oct 20 '22 05:10

Arnab

Related questions
                            
                                Determine distance between two random nodes in a tree
                            
                                Efficient functional algorithm for computing closure under an operator
                            
                                Gesture detection algorithm based on discrete points
                            
                                Finding Number of Palindromic Substrings in O(n) or O(n log n)?
                            
                                How can I convert a tabbed tree to JSON in JavaScript?
                            
                                Why is inorder and preorder traversal useful for creating an algorithm to decide if T2 is a subtree of T1
                            
                                Recreate the Sequence
                            
                                Depth First Search on a Binary Tree
                            
                                Finding cheapest path on a graph, cost determined by max-weight of used nodes
                            
                                Go through all permutations of an array recursively
                            
                                How to compress a sequence of non-repeated number size N bits?
                            
                                Create possible combinations of specific size
                            
                                Is There a Reason Standard Algorithms Take Lambdas by Value? [duplicate]
                            
                                What type of heap is used and time complexity of std::priority_queue in c++? [duplicate]
                            
                                Problems with implementing Lucas–Lehmer primality test
                            
                                Why can pointers to non-static member functions not be used as a unary predicate for standard library algorithms?
                            
                                Software implementation of floating point division, issues with rounding
                            
                                How to get two lists that have the most elements in common in a nested list in Python
                            
                                Count number of pairs of nodes in undirected graph such that W - L >= K
                            
                                Fast text editor find

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm to determine indices i..j of array A containing all the elements of another array B

Tags:

algorithm

Skylark

People also ask

3 Answers

Complexity

Time: O((m+n)log m)

Space: O(m)

ShreevatsaR

Skylark

Arnab

Recent Activity

Donate For Us