How to find pair with kth largest sum?

Tags:

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays

[2, 3, 5, 8, 13]
[4, 8, 12, 16]

The pairs with largest sums are

13 + 16 = 29
13 + 12 = 25
8 + 16 = 24
13 + 8 = 21
8 + 12 = 20

So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?

Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.

I am already aware of the O(Klogk) solution , using Max-Heap given here .

It also is one of the favorite Google interview question , and they demand a O(k) solution .

I've also read somewhere that there exists a O(k) solution, which i am unable to figure out .

Can someone explain the correct solution with a pseudocode .

P.S. Please DON'T post this link as answer/comment.It DOESN'T contain the answer.

546

asked Sep 01 '13 09:09

Spandan

1 Answers

I start with a simple but not quite linear-time algorithm. We choose some value between array1[0]+array2[0] and array1[N-1]+array2[N-1]. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between array1[N-1]+array2[N-1] and array1[0]+array2[0]. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.

Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N²) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?

We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank k/4 for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.

This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.

If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.

I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses std::nth_element, which allows linear complexity only on average (not worst-case).

A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.

OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.

Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).

175

answered Sep 30 '22 16:09

Evgeny Kluev

Related questions
                            
                                Optimizing an arithmetic coder
                            
                                VS 2010 very slow
                            
                                Can deriving a class from 'enable_shared_from_this' increase performance?
                            
                                Performance of string.IndexOf OrdinalIgnoreCase vs CurrentCultureIgnoreCase [duplicate]
                            
                                Is an empty function called at all in optimised code?
                            
                                How to avoid slowdown due to locked code?
                            
                                Bulk insert performance in MongoDB for large collections
                            
                                LIMIT 1 is very slow, for specific records, using different keys
                            
                                How to test performance / load of a modern angular application
                            
                                Java multithreaded file downloading performance
                            
                                Android Battery usage profiling
                            
                                First WCF connection made in new AppDomain is very slow
                            
                                Slow index speed of Elasticsearch
                            
                                Mysql: 7 billions records in a table
                            
                                Modeling distribution of performance measurements
                            
                                javascript functions and arguments object, is there a cost involved
                            
                                Accessing Big Arrays in PHP
                            
                                Javascript Performance: How come looping through an array and checking every value is faster than indexOf, search and match?
                            
                                requestFocus for TextView on Jelly Bean slow
                            
                                AtomicInteger implementation and code duplication

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find pair with kth largest sum?

Tags:

performance

language-agnostic

algorithm

math

combinatorics

Spandan

People also ask

1 Answers

Evgeny Kluev

Recent Activity

Donate For Us