Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ to Objects and improved perf with an Index?

I am using LINQ to Objects and wonder if it is possible to improve the performance of my queries by making use of an index that I have. This is best explained with an example. Imagine a simple type...

public class Person
{
    public int Age;
    public string FirstName;
    public string LastName;
}

And a simple query I would make against it...

List<Person> people = new List<Person>();

// 'people' populated with 50,000 instances...

var x = from t in people
        where t.Age > 18 && t.Age < 21
        select t;

If I understand LINQ to Objects correctly then the implementation of the Where extension method will enumerate all 50,000 instances in the people collection in order to find the 100 that actually match. As it happens I already have an index of the people collection that is sorted by Age. Like this...

SortedList<int, Person> ageSorted = new SortedList<int, Person>();

Clearly it would make sense if I could get the Where to use the SortedList so that it no longer has to enumerate all 50,000 instances, instead finding the range of 100 matching entries and so saving time.

Is it possible to extend LINQ to Objects to enable my situation? Is it already possible but I am missing the technique?

like image 909
Phil Wright Avatar asked Feb 03 '23 13:02

Phil Wright


1 Answers

There's already a project which I believe does exactly that - i4o. I can't say I've used it myself, but it sounds like the kind of thing you want... you may need to juggle your existing code a bit, but it's certainly worth looking at.

If that doesn't help, you could at least write your own extension methods on SortedList<TKey, TValue>. You probably wouldn't be able to easily use your actual where clause, but you could use your own methods taking a minimum and a maximum value. You might also want to make them apply to IList<T> where you assert that you've already sorted the values appropriately (according to some comparer).

For example (completely untested):

public static IEnumerable<T> Between<T, TKey>(this IList<T> source,
                                              Func<T, TKey> projection,
                                              TKey minKeyInclusive,
                                              TKey maxKeyExclusive,
                                              IComparer<TKey> comparer)
{
    comparer = comparer ?? Comparer<TKey>.Default;

    // TODO: Find the index of the lower bound via a binary search :)
    // (It's too late for me to jot it down tonight :)
    int index = ...; // Find minimum index

    while (index < source.Count &&
           comparer.Compare(projection(source[index]), maxKeyExclusive) < 0)
    {
        yield return source[index];
        index++;
    }
}

(If you only have List<T> instead of IList<T>, you could use List<T>.BinarySearch, although you'd need to build a custom IComparer<T>.)

Also, have a look at SortedSet<T> in .NET 4.

like image 154
Jon Skeet Avatar answered Feb 05 '23 03:02

Jon Skeet