Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the median of the sum of the arrays

Two sorted arrays of length n are given and the question is to find, in O(n) time, the median of their sum array, which contains all the possible pairwise sums between every element of array A and every element of array B.

For instance: Let A[2,4,6] and B[1,3,5] be the two given arrays. The sum array is [2+1,2+3,2+5,4+1,4+3,4+5,6+1,6+3,6+5]. Find the median of this array in O(n).

Solving the question in O(n^2) is pretty straight-forward but is there any O(n) solution to this problem?

Note: This is an interview question asked to one of my friends and the interviewer was quite sure that it can be solved in O(n) time.

like image 491
Aditya Avatar asked Jun 26 '13 09:06

Aditya


People also ask

What is the median of an array?

If an array is sorted, median is the middle element of an array in case of odd number of elements in an array and when number of elements in an array is even than it will be an average of two middle elements.

How do you find the median of two arrays?

There are two cases: Case 1: m+n is odd, the median is at (m+n)/2 th index in the array obtained after merging both the arrays. Case 2: m+n is even, the median will be the average of elements at index ((m+n)/2 – 1) and (m+n)/2 in the array obtained after merging both the arrays.

How do you find the median of two arrays in Python?

Method 1 (Simply count while Merging) Use the merge procedure of merge sort. Keep track of count while comparing elements of two arrays. If count becomes n(For 2n elements), we have reached the median. Take the average of the elements at indexes n-1 and n in the merged array.

How do you find the median of an array in C++?

If the number of element in the array is odd, then find the (N/2)th element using nth_element() function as illustrated below and then the value at index (N/2) is the median of the given array.


2 Answers

The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack in the comments).

It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n) the amount of elements that are smaller/greater than a given number k. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2 colums and n/2 rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n). It is crazy!

I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn) solution instead :).


O(n * logn) solution:

It's an interview! You can't get that O(n) solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²) candidates?

I'll make use of the O(n) algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k in a sorted n-by-n matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n, as described by the OP, results in a sorted n-by-n matrix, which we can simulate by considering the elements of the array as follows:

a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
 5+4, 5+6, 5+8,
 9+4, 9+6, 9+8}

Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k. We want to find in O(n) how many of the numbers in this matrix are smaller than k, and how many are greater. Clearly, if both values are less than (n²+1)/2, that means k is our median!

The algorithm is pretty simple:

int smaller_than_k(int k){
    int x = 0, j = n-1;
    for(int i = 0; i < n; ++i){
        while(j >= 0 && k <= a[i]+b[j]){
            --j;
        }
        x += j+1;
    }
    return x;
}

This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i and j iterate at most n times each, the algorithm is O(n) [Note that j does not get reset within the for loop]. The greater_than_k algorithm is similar.

Now, how do we choose k? That is the logn part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:

candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};.

Simply sort this array [also O(n*logn)], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i] is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i] whose result smaller_than_k(k) returns smaller than (n²+1)/2 is the answer, and is obtained in log(n) iterations:

int b_search(){
    int lo = 0, hi = n, mid, n2 = (n²+1)/2;
    while(hi-lo > 1){
        mid = (hi+lo)/2;
        if(smaller_than_k(candidate[mid]) < n2)
            lo = mid;
        else
            hi = mid;
    }
    return candidate[lo]; // the median
}
like image 67
i Code 4 Food Avatar answered Oct 01 '22 08:10

i Code 4 Food


Let's say the arrays are A = {A[1] ... A[n]}, and B = {B[1] ... B[n]}, and the pairwise sum array is C = {A[i] + B[j], where 1 <= i <= n, 1 <= j <= n} which has n^2 elements and we need to find its median.

Median of C must be an element of the array D = {A[1] + B[n], A[2] + B[n - 1], ... A[n] + B[1]}: if you fix A[i], and consider all the sums A[i] + B[j], you would see that the only A[i] + B[j = n + 1 - i] (which is one of D) could be the median. That is, it may not be the median, but if it is not, then all other A[i] + B[j] are also not median.

This can be proved by considering all B[j] and count the number of values that are lower and number of values that are greater than A[i] + B[j] (we can do this quite accurately because the two arrays are sorted -- the calculation is a bit messy thought). You'd see that for A[i] + B[n + 1 - j] these two counts are most "balanced".

The problem then reduces to finding median of D, which has only n elements. An algorithm such as Hoare's will work.

UPDATE: this answer is wrong. The real conclusion here is that the median is one of D's element, but then D's median is the not the same as C's median.

like image 42
Khanh Nguyen Avatar answered Oct 01 '22 08:10

Khanh Nguyen