I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.
I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?
First, your query is being evaluated twice (once for Any
, and once for Min
). Second, Min
requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:
if (items.Any())
{
return list[items.Min()];
}
To this:
var default =
(from kv in list
where kv.Key >= i
select (int?)kv.Key).FirstOrDefault();
if(default != null) return list[default.Value];
return null;
UPDATE
Because you're selecting a value type, FirstOrDefault
doesn't return a nullable object. I have altered your query to cast the selected value to an int?
instead, allowing the resulting value to be checked for null
. I would advocate this over using ContainsKey
, as that would return true
if your list contained a value for 0
. For example, say you have the following values
0 2 4 6 8
If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (default(int)
), which is in the list but isn't a valid result.
Writing a binary search on your own can be tough.
Fortunately, Microsoft already wrote a pretty robust one: Array.BinarySearch<T>
. This is, in fact, the method that SortedList<TKey, TValue>.IndexOfKey
uses internally. Only problem is, it takes a T[]
argument, instead of any IList<T>
(like SortedList<TKey, TValue>.Keys
).
You know what, though? There's this great tool called Reflector that lets you look at .NET source code...
Check it out: a generic BinarySearch
extension method on IList<T>
, taken straight from the reflected code of Microsoft's Array.BinarySearch<T>
implementation.
public static int BinarySearch<T>(this IList<T> list, int index, int length, T value, IComparer<T> comparer) {
if (list == null)
throw new ArgumentNullException("list");
else if (index < 0 || length < 0)
throw new ArgumentOutOfRangeException((index < 0) ? "index" : "length");
else if (list.Count - index < length)
throw new ArgumentException();
int lower = index;
int upper = (index + length) - 1;
while (lower <= upper) {
int adjustedIndex = lower + ((upper - lower) >> 1);
int comparison = comparer.Compare(list[adjustedIndex], value);
if (comparison == 0)
return adjustedIndex;
else if (comparison < 0)
lower = adjustedIndex + 1;
else
upper = adjustedIndex - 1;
}
return ~lower;
}
public static int BinarySearch<T>(this IList<T> list, T value, IComparer<T> comparer) {
return list.BinarySearch(0, list.Count, value, comparer);
}
public static int BinarySearch<T>(this IList<T> list, T value) where T : IComparable<T> {
return list.BinarySearch(value, Comparer<T>.Default);
}
This will let you call list.Keys.BinarySearch
and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):
int index = list.Keys.BinarySearch(i);
if (index < 0)
index = ~index;
var item = index < list.Count ? list[list.Keys[index]] : null;
return item;
Using LINQ on a SortedList
will not give you the benefit of the sort.
For optimal performance, you should write your own binary search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With