Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal algorithm for returning top k values from an array of length N

People also ask

How do you find the top k values within N values?

You can do this in O(n) by using a selection algorithm. Find the k th largest element with the partition algorithm, then all the elements after it will be larger than it, and those are your top k . If you need those top k in sorted order, you can then sort them in O(k log k) .

What is KTH largest element in array?

If we sort the array in ascending order, the kth element of an array will be the kth smallest element. To find the kth largest element, we can pass k= length(Array) – k.

What is the quicksort algorithm?

Quicksort is a fast sorting algorithm that works by splitting a large array of data into smaller sub-arrays. This implies that each iteration works by splitting the input into two components, sorting them, and then recombining them.


Method 1

Since k is small, you can use the tournament method to find the kth largest. This method is described in Knuth's Art of Programming, Volume 3, Page 212.

First create a tournament on n-k+2 elements. Something like a knockout tennis tournament. First you split into pairs and compare the members of the pairs (as if those two played a match and one lost). Then the winners, you split in to pairs again and so on, till you have a winner. You can view it as a tree, with the winner at the top.

This takes n-k+1 compares exactly.

Now the winner of these n-k+2 cannot be your kth largest element. Consider its path P up the tournament.

Of the remaining k-2 now pick one, and follow that up the path P which will give you a new largest. Basically you sort of redo the tournament with the previous winner being replaced by one of the k-2 elements. Let P be the path of the new winner. Now pick another from the k-3 and follow the new path up and so on.

At the end after you exhaust the k-2, replace the largest with -infinity and the largest of the tournament will be the kth largest. The elements you have thrown away are the top k-1 elements.

This takes at most n - k + (k-1) [log (n-k+2)] comparisons to find the top k. It uses O(n) memory though.

In terms of number of compares this should likely beat any selection algorithms.

Method 2

As an alternative, you can maintain a min-heap of k elements.

First insert k elements. Then for each element of array, if it is less than the min element of the heap, throw it away. Otherwise, delete-min of the heap and insert the element from the array.

At the end, the heap will contain the top k elements. This will take O(n log k) compares.

Of course, if n is small, just sorting the array should be good enough. The code will be simpler too.


You can do this in O(n) by using a selection algorithm. Find the kth largest element with the partition algorithm, then all the elements after it will be larger than it, and those are your top k.

If you need those top k in sorted order, you can then sort them in O(k log k).


Short answer: no.

Longer answer: yes, several mutually incompatible optimal solutions are known. It depends on n, k and what properties of the array you can guarantee.

If you know nothing about the array the lower bound of complexity is obviously O(n), because all elements of the source array must be examined to see if they fit in the top 10. If you know anything about the source array which allows elements to be safely skipped you should use that knowledge.

Similarly the upper complexity bound is O(n.log(n)) because you could always choose to find the answer by sorting the array (O(n.log(n)) and returning the first 10 items (O(1)).

A linear search comparing each item to the tenth-highest found so far and inserting it at the appropriate place in the list of highest-found-so-far items if needed has similar complexity for the average and best-case scenarios and has a worst-case of O(kn) which is significantly better than O(n-squared). For the sizes you estimate I expect this method to perform well.

If n were much larger (~10000) and k were increased in the same ratio it would probably be worthwhile implementing the quickselect algorithm. Quickselect performs better the more elements you want. If, however, k were not increased in scale with n you should stick with linear searching. Quickselect & friends modify the original array, so are less suitable if you cannot do this in place because you need a bunch more storage and a lot of copying that the algorithm complexity does not include.

If n is huge (~1e20) you would want to find the k largest from each of a number of partitions of the input array and then find the k-largest from the aggregate of those results, so that you were not trying to analyse more data than you can fit in memory at a time, and to allow the operation to be parallelised efficiently.


Following is a heap based elegant solution in Java with complexity O(nlogK). It's not the most efficient but i think it's easy enough to understand. You can change Integer to Float if you want a float based solution

import java.util.Arrays;
import java.util.PriorityQueue;

public class FindKLargest {

public static void find(int[] A, int k) {

    PriorityQueue<Integer> pq = new PriorityQueue<>(k);// Min heap because the element has to be greater
                                                        // than the smallest element in the heap in order
                                                        // to be qualified to be a member of top k elements.
    for (int i = 0; i < A.length; i++) {
        if (i < k) // add until heap is filled with k elements.
            pq.add(A[i]);
        else if (pq.peek() < A[i]) { // check if it's bigger than the
                                        // smallest element in the heap.
            pq.poll();
            pq.add(A[i]);
        }
    }
    int[] topK = new int[pq.size()];
    int index = 0;
    while (index != k)
        topK[index++] = pq.poll();
    System.out.println(Arrays.toString(topK));
}

public static void main(String[] args) {
    int[] arr = { 1, -2, -3, -4, -5 };
    find(arr, 4);
}

}