Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B. For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays. The sum array is <code>[2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]</code>. Find the median of this array in O(n). Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem? Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.

Let's say the arrays are <code>A = {A[1] ... A[n]}</code>, and <code>B = {B[1] ... B[n]}</code>, and the pairwise sum array is <code>C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n}</code> which has <code>n^2</code> elements and we need to find its median. Median of <code>C</code> must be an element of the array <code>D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}</code>: if you fix <code>A[i]</code>, and consider all the sums <code>A[i] + B[j]</code>, you would see that the only <code>A[i] + B[j = n + 1 - i]</code> (which is one of <code>D</code>) could be the median. That is, it may not be the median, but if it is not, then all other <code>A[i] + B[j]</code> are also not median. This can be proved by considering all <code>B[j]</code> and count the number of values that are lower and number of values that are greater than <code>A[i] + B[j]</code> (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for <code>A[i] + B[n + 1 - j]</code> these two counts are most "balanced". The problem then reduces to finding median of <code>D</code>, which has only <code>n</code> elements. An algorithm such as Hoare's will work. UPDATE: this answer is wrong. The real conclusion here is that the median is one of <code>D</code>'s element, but then <code>D</code>'s median is the not the same as <code>C</code>'s median.

Find the median of the sum of the arrays

Tags:

arrays

algorithm

median

Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B.

For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays. The sum array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]. Find the median of this array in O(n).

Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem?

Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.

491

asked Jun 26 '13 09:06

Aditya

2 Answers

The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).

It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n) the amount of elements that are smaller/greater than a given number k. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2 colums and n/2 rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n). It is crazy!

I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn) solution instead :).

O(n * logn) solution:

It's an interview! You can't get that O(n) solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²) candidates?

I'll make use of the O(n) algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k in a sorted n-by-n matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n, as described by the OP, results in a sorted n-by-n matrix, which we can simulate by considering the elements of the array as follows:

a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
 5+4, 5+6, 5+8,
 9+4, 9+6, 9+8}

Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k. We want to find in O(n) how many of the numbers in this matrix are smaller than k, and how many are greater. Clearly, if both values are less than (n²+1)/2, that means k is our median!

The algorithm is pretty simple:

int smaller_than_k(int k){
    int x = 0, j = n-1;
    for(int i = 0; i < n; ++i){
        while(j >= 0 && k <= a[i]+b[j]){
            --j;
        }
        x += j+1;
    }
    return x;
}

This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i and j iterate at most n times each, the algorithm is O(n) [Note that j does not get reset within the for loop]. The greater_than_k algorithm is similar.

Now, how do we choose k? That is the logn part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:

candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};.

Simply sort this array [also O(n*logn)], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i] is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i] whose result smaller_than_k(k) returns smaller than (n²+1)/2 is the answer, and is obtained in log(n) iterations:

int b_search(){
    int lo = 0, hi = n, mid, n2 = (n²+1)/2;
    while(hi-lo > 1){
        mid = (hi+lo)/2;
        if(smaller_than_k(candidate[mid]) < n2)
            lo = mid;
        else
            hi = mid;
    }
    return candidate[lo]; // the median
}

answered Oct 01 '22 08:10

i Code 4 Food

Let's say the arrays are A = {A[1] ... A[n]}, and B = {B[1] ... B[n]}, and the pairwise sum array is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} which has n^2 elements and we need to find its median.

Median of C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}: if you fix A[i], and consider all the sums A[i] + B[j], you would see that the only A[i] + B[j = n + 1 - i] (which is one of D) could be the median. That is, it may not be the median, but if it is not, then all other A[i] + B[j] are also not median.

This can be proved by considering all B[j] and count the number of values that are lower and number of values that are greater than A[i] + B[j] (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for A[i] + B[n + 1 - j] these two counts are most "balanced".

The problem then reduces to finding median of D, which has only n elements. An algorithm such as Hoare's will work.

UPDATE: this answer is wrong. The real conclusion here is that the median is one of D's element, but then D's median is the not the same as C's median.

answered Oct 01 '22 08:10

Khanh Nguyen

Related questions
                            
                                How to declare a byte array in Scala?
                            
                                Apply method to each elements in array/enumerable
                            
                                Sort array by ISO 8601 date
                            
                                Convert array of 2-element arrays into a hash, where duplicate keys append additional values
                            
                                How to use php array with sql IN operator?
                            
                                How to replace elements in array with elements of another array
                            
                                Postgres integer arrays as parameters?
                            
                                Sort an array according to the elements of another array
                            
                                join array enclosing each value with quotes javascript
                            
                                Can I append an array to 'formdata' in javascript?
                            
                                PHP create array where key and value is same
                            
                                PHP array printing using a loop
                            
                                How do I remove all zero elements from a NumPy array?
                            
                                Insert array into MySQL database with PHP
                            
                                What is so wrong with extract()?
                            
                                How to iterate through an array starting from the last element? (Ruby)
                            
                                How to delete property from spread operator?
                            
                                How to rotate a matrix 90 degrees without using any extra space? [duplicate]
                            
                                Best way to flatten JS object (keys and values) to a single depth array
                            
                                default array values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With