My code processes a huge number of values and I'm looking for an efficient structure to keep track of the top (N) values, where N is less than 10, so collecting ALL numbers then sorting the list and taking the first (N) is probably not the most efficient way.
To do that, I'm building a collection of fixed size N, to keep the top (N) values sorted in descending order. The Add(T value)
method of the sorted collection would add the value to the collection if value is higher than any of the existing values (in which case the last element is removed) or if the collection is not full.
I was able to implement what I wanted using a doubly LinkedList<T>
since it has fast insertion and removal, but I was wondering if using SortedDictionary<TKey, TValue>
or a priority queue would be better ?
Thank you.
I would simply use a heap with a limited depth. I do not know whether there already exists a library for that, but it should be easy to implement.
The main advantage to use a SortedDictionary or SortedList it is that you can skip the sorting intelligence because they handle it for you( e.g. You just have to remove the (n + 1)th element every time you add a value). But on the other hands adopt that sort of complex structure for 10 elements resembles to use a nuke to kill a fly...
Maybe the linked list is a good way, and also a simple linear comparison for inserting values in order is not so slower than binary search (we still speak about max 10 comparisons against ~3, current CPUs not event feel the difference).
EDIT:
fixed arrays can be used to build prioriry queues with binary heaps, that probably is the right way to implement this
The performance may really change.
For N < 10 any overly complex data structure will likely drag performance significantly (though perhaps not catastrophically) so I'd use an array to store the items.
Then there are 3 main possibilities on how to arrange the items in the array:
For such a small number, just keep an array. Scan the array keeping track of the smallest value and its position. If your new number is larger than the smallest on in the set, replace it. You should of course scan for the lowest value once after you insert a number, then just compare new numbers to that and only take action if you have something larger (replace and rescan).
Unless you have a solid reason to do otherwise, I'd use a priority queue.
There is one trick that can simplify the logic quite a bit. Most people's first idea is to look at each incoming item, and insert it into the collection iff the collection contains fewer items than desired, or the new item is larger than the smallest item currently in the collection.
You can simplify things quite a bit if you leave room for one extra item in the collection. Always insert each incoming item into the collection, and then if the collection is too large, remove the smallest item.
While a priority queue is arguably overkill for only 10 items, it keeps the logic simple, and is efficient both in terms of space and time, so if you ever need N=10000 (or whatever) it'll still work nicely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With