Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Kth largest number in single pass without storing the whole array

The algo I have in mind is

  • keep an MaxHeap of size K
  • insert each element
  • drop out smaller value if heap is full
  • In the end, Kth max is the smaller of MaxHeap

That will give me O(NlogK). Is there a better algorithm? I can't do quick selection, because the array can't be stored in memory.

like image 946
Wei Shi Avatar asked Jun 22 '11 21:06

Wei Shi


People also ask

How do you find the kth largest element in an unsorted array?

You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen. For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.

How do you find the largest KTH element?

If we sort the array in ascending order, the kth element of an array will be the kth smallest element. To find the kth largest element, we can pass k= length(Array) – k.


2 Answers

Depending on your memory restrictions, you can use a modified version of the median-of-medians algorithm to solve the problem in O(n) time and O(k) space.

The idea is as follows. Maintain an array of size 2k in memory. Fill this buffer with the first 2k elements from the array, then run the median-of-medians algorithm on it to put the largest k elements in the first half of the array and the smallest k elements in the second half. Then, discard the smallest k elements. Now, load the next k elements into the second half of the array, use the median-of-medians algorithm to again put the top k elements in the left side and the bottom k elements on the right. If you iterate this process across the array - replacing the second half of the buffer with the next k elements from the array, then using median-of-medians to move the top k of those to the left half - then at the end you will have the top k elements in the left half of the array. Finding the smallest of those (in O(k) time) will then give you the kth largest element.

Overall, you end up making O(n / k) calls to the median-of-medians algorithm with an array of size O(k), which is O(n / k) calls to an algorithm that takes O(k) time, for a net runtime of O(n). This, combined with the final step, runs in O(n + k) = O(n) time. Moreover, since the memory usage for the median-of-medians step is O(k), and since you have a buffer of size O(k) lying around, this uses only O(k) memory. In other words, it's asymptotically faster than the min-heap solution and asymptotically equivalent in memory.

Hope this helps!

like image 145
templatetypedef Avatar answered Oct 16 '22 00:10

templatetypedef


This is known as the http://en.wikipedia.org/wiki/Selection_algorithm

One algorithm in particular is http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

It is O(N) time and O(1) space. I believe it can be done in-place, if your array isn't sorted. If your array is being stored externally (hard disk, network, etc.), you can leverage the ~K words you have to work with. If your array is dynamically generated by a function, you'll be in a similar situation.

like image 33
ninjagecko Avatar answered Oct 15 '22 22:10

ninjagecko