Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find pair with kth largest sum?

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays

  • [2, 3, 5, 8, 13]
  • [4, 8, 12, 16]

The pairs with largest sums are

  • 13 + 16 = 29
  • 13 + 12 = 25
  • 8 + 16 = 24
  • 13 + 8 = 21
  • 8 + 12 = 20

So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?

Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.


I am already aware of the O(Klogk) solution , using Max-Heap given here .

It also is one of the favorite Google interview question , and they demand a O(k) solution .

I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .

Can someone explain the correct solution with a pseudocode .

P.S. Please DON'T post this link as answer/comment.It DOESN'T contain the answer.

like image 546
Spandan Avatar asked Sep 01 '13 09:09

Spandan


People also ask

How do you find the largest KTH element?

To find the kth largest element, we can pass k= length(Array) – k. Now let's implement the partition method, which picks the rightmost element as a pivot, puts it at the appropriate index, and partitions the array in such a way that elements at lower indexes should be less than the pivot element.

How do you find the maximum sum pair of an array?

The maximum pair sum is the largest pair sum in a list of pairs. For example, if we have pairs (1,5) , (2,3) , and (4,4) , the maximum pair sum would be max(1+5, 2+3, 4+4) = max(6, 5, 8) = 8 .

How do you find the kth largest element in an array in Python?

Suppose we have an unsorted array, we have to find the kth largest element from that array. So if the array is [3,2,1,5,6,4] and k = 2, then the result will be 5. We will sort the element, if the k is 1, then return last element, otherwise return array[n – k], where n is the size of the array.


1 Answers

I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.

Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N2) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?

We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.

This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.

If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.

I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).


A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.

OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.

Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).

like image 175
Evgeny Kluev Avatar answered Sep 30 '22 16:09

Evgeny Kluev