Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract the k maximum elements of a list

Tags:

let's say I have a collection of some type, e.g.

IEnumerable<double> values;

Now I need to extract the k highest values from that collection, for some parameter k. This is a very simple way to do this:

values.OrderByDescending(x => x).Take(k)

However, this (if I understand this correctly) first sorts the entire list, then picks the first k elements. But if the list is very large, and k is comparatively small (smaller than log n), this is not very efficient - the list is sorted in O(n*log n), but I figure selecting the k highest values from a list should be more like O(n*k).

So, does anyone have any suggestion for a better, more efficient way to do this?

like image 520
Henrik Berg Avatar asked Feb 26 '13 12:02

Henrik Berg


People also ask

How do you find the K max of an array?

Using Max Heap log(n)) by using a max-heap. The idea is to simply construct a max-heap of size n and insert all the array elements [0…n-1] into it. Then pop first k-1 elements from it. Now k'th largest element will reside at the root of the max-heap.

How do you select K elements from an array?

Below are the steps: Sort the given array. Select the least non-negative value from the above array(say at index idx) using lower_bound() in C++. If a positive value doesn't exist in a given array, then the sum is always negative and none of the K element satisfies the given condition.

How do you find the maximum number of elements?

To find the largest element, the first two elements of array are checked and the largest of these two elements are placed in arr[0] the first and third elements are checked and largest of these two elements is placed in arr[0] . this process continues until the first and last elements are checked.


2 Answers

This gives a bit of a performance increase. Note that it's ascending rather than descending but you should be able to repurpose it (see comments):

static IEnumerable<double> TopNSorted(this IEnumerable<double> source, int n)
{
    List<double> top = new List<double>(n + 1);
    using (var e = source.GetEnumerator())
    {
        for (int i = 0; i < n; i++)
        {
            if (e.MoveNext())
                top.Add(e.Current);
            else
                throw new InvalidOperationException("Not enough elements");
        }
        top.Sort();
        while (e.MoveNext())
        {
            double c = e.Current;
            int index = top.BinarySearch(c);
            if (index < 0) index = ~index;
            if (index < n)                    // if (index != 0)
            {
                top.Insert(index, c);
                top.RemoveAt(n);              // top.RemoveAt(0)
            }
        }
    }
    return top;  // return ((IEnumerable<double>)top).Reverse();
}
like image 196
Rawling Avatar answered Oct 07 '22 23:10

Rawling


Consider the below method:

static IEnumerable<double> GetTopValues(this IEnumerable<double> values, int count)
{
    var maxSet = new List<double>(Enumerable.Repeat(double.MinValue, count));
    var currentMin = double.MinValue;

    foreach (var t in values)
    {
        if (t <= currentMin) continue;
        maxSet.Remove(currentMin);
        maxSet.Add(t);
        currentMin = maxSet.Min();
    }

    return maxSet.OrderByDescending(i => i);
}

And the test program:

static void Main()
{
    const int SIZE = 1000000;
    const int K = 10;
    var random = new Random();

    var values = new double[SIZE];
    for (var i = 0; i < SIZE; i++)
        values[i] = random.NextDouble();

    // Test values
    values[SIZE/2] = 2.0;
    values[SIZE/4] = 3.0;
    values[SIZE/8] = 4.0;

    IEnumerable<double> result;

    var stopwatch = new Stopwatch();

    stopwatch.Start();
    result = values.OrderByDescending(x => x).Take(K).ToArray();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.ElapsedMilliseconds);

    stopwatch.Restart();
    result = values.GetTopValues(K).ToArray();
    stopwatch.Stop();
    Console.WriteLine(stopwatch.ElapsedMilliseconds);
}

On my machine results are 1002 and 14.

like image 36
Ryszard Dżegan Avatar answered Oct 07 '22 23:10

Ryszard Dżegan