let's say I have a collection of some type, e.g.
IEnumerable<double> values;
Now I need to extract the k highest values from that collection, for some parameter k. This is a very simple way to do this:
values.OrderByDescending(x => x).Take(k)
However, this (if I understand this correctly) first sorts the entire list, then picks the first k elements. But if the list is very large, and k is comparatively small (smaller than log n), this is not very efficient - the list is sorted in O(n*log n), but I figure selecting the k highest values from a list should be more like O(n*k).
So, does anyone have any suggestion for a better, more efficient way to do this?
Using Max Heap log(n)) by using a max-heap. The idea is to simply construct a max-heap of size n and insert all the array elements [0…n-1] into it. Then pop first k-1 elements from it. Now k'th largest element will reside at the root of the max-heap.
Below are the steps: Sort the given array. Select the least non-negative value from the above array(say at index idx) using lower_bound() in C++. If a positive value doesn't exist in a given array, then the sum is always negative and none of the K element satisfies the given condition.
To find the largest element, the first two elements of array are checked and the largest of these two elements are placed in arr[0] the first and third elements are checked and largest of these two elements is placed in arr[0] . this process continues until the first and last elements are checked.
This gives a bit of a performance increase. Note that it's ascending rather than descending but you should be able to repurpose it (see comments):
static IEnumerable<double> TopNSorted(this IEnumerable<double> source, int n)
{
List<double> top = new List<double>(n + 1);
using (var e = source.GetEnumerator())
{
for (int i = 0; i < n; i++)
{
if (e.MoveNext())
top.Add(e.Current);
else
throw new InvalidOperationException("Not enough elements");
}
top.Sort();
while (e.MoveNext())
{
double c = e.Current;
int index = top.BinarySearch(c);
if (index < 0) index = ~index;
if (index < n) // if (index != 0)
{
top.Insert(index, c);
top.RemoveAt(n); // top.RemoveAt(0)
}
}
}
return top; // return ((IEnumerable<double>)top).Reverse();
}
Consider the below method:
static IEnumerable<double> GetTopValues(this IEnumerable<double> values, int count)
{
var maxSet = new List<double>(Enumerable.Repeat(double.MinValue, count));
var currentMin = double.MinValue;
foreach (var t in values)
{
if (t <= currentMin) continue;
maxSet.Remove(currentMin);
maxSet.Add(t);
currentMin = maxSet.Min();
}
return maxSet.OrderByDescending(i => i);
}
And the test program:
static void Main()
{
const int SIZE = 1000000;
const int K = 10;
var random = new Random();
var values = new double[SIZE];
for (var i = 0; i < SIZE; i++)
values[i] = random.NextDouble();
// Test values
values[SIZE/2] = 2.0;
values[SIZE/4] = 3.0;
values[SIZE/8] = 4.0;
IEnumerable<double> result;
var stopwatch = new Stopwatch();
stopwatch.Start();
result = values.OrderByDescending(x => x).Take(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
result = values.GetTopValues(K).ToArray();
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
On my machine results are 1002 and 14.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With