The algo I have in mind is
That will give me O(NlogK). Is there a better algorithm? I can't do quick selection, because the array can't be stored in memory.
You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen. For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.
If we sort the array in ascending order, the kth element of an array will be the kth smallest element. To find the kth largest element, we can pass k= length(Array) – k.
Depending on your memory restrictions, you can use a modified version of the median-of-medians algorithm to solve the problem in O(n) time and O(k) space.
The idea is as follows. Maintain an array of size 2k in memory. Fill this buffer with the first 2k elements from the array, then run the median-of-medians algorithm on it to put the largest k elements in the first half of the array and the smallest k elements in the second half. Then, discard the smallest k elements. Now, load the next k elements into the second half of the array, use the median-of-medians algorithm to again put the top k elements in the left side and the bottom k elements on the right. If you iterate this process across the array - replacing the second half of the buffer with the next k elements from the array, then using median-of-medians to move the top k of those to the left half - then at the end you will have the top k elements in the left half of the array. Finding the smallest of those (in O(k) time) will then give you the kth largest element.
Overall, you end up making O(n / k) calls to the median-of-medians algorithm with an array of size O(k), which is O(n / k) calls to an algorithm that takes O(k) time, for a net runtime of O(n). This, combined with the final step, runs in O(n + k) = O(n) time. Moreover, since the memory usage for the median-of-medians step is O(k), and since you have a buffer of size O(k) lying around, this uses only O(k) memory. In other words, it's asymptotically faster than the min-heap solution and asymptotically equivalent in memory.
Hope this helps!
This is known as the http://en.wikipedia.org/wiki/Selection_algorithm
One algorithm in particular is http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm
It is O(N)
time and O(1)
space. I believe it can be done in-place, if your array isn't sorted. If your array is being stored externally (hard disk, network, etc.), you can leverage the ~K
words you have to work with. If your array is dynamically generated by a function, you'll be in a similar situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With