Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List with multiple indexes

Given a generic List I would need some kind of index (in the database sense) that would allow me fast retrieval. The keys for this index would not be unique, so I can't use a dictionary. Here's what I have in mind: Given a class Foo { P1, P2, P3 } that may have data like this

{ "aaa", 111, "yes" }
{ "aaa", 112, "no" }
{ "bbb", 111, "no" }
{ "bbb", 220, "yes" }
{ "bbb", 220, "no" }
{ "ccc", 300, "yes" }

I would need to quickly access all the records where P1 is "bbb" (3rd,4th, and 5th) or all the ones where P2 is 111 (1st and 3rd). I could use a sorted List but if I need more than one way of sorting / indexing I would end up with duplicated lists.

Is there something built-in into the .NET framework or maybe an OS library that would do something like this? Thanks.

P.S. I mentioned "sorted List" with the idea that a sorted list will return / find an item much faster. I do not need the list to be necessarily sorted; I'm just looking for fast retrieval / finding.

like image 922
pbz Avatar asked Jan 27 '10 00:01

pbz


1 Answers

Don't ever forget this principle: Make it correct, make it clear, make it concise, make it fast. In that order. So, first code up the naive implementation:

static IEnumerable<T> GetByIndex<T>(
    List<T> list,
    Func<T, TIndex> func,
    TIndex key
) {
    return list.Where(x => func(x) == key);
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
IEnumerable<Test> lookup = GetByIndex(tests, x => x.Name, "bbb");

The above is correct, clear and concise. Almost surely it is fast enough for your purposes.

So, as far as making it fast you must first measure:

  1. Establish reasonable performance criterion.
  2. Establish a test-bed of real-world data.
  3. Profile the simple approach against the test-bed of real-world data. Note here that profiling includes deducing whether or not this functionality is a bottleneck in your application.

Then, if and only if this is not fast enough for you should you try to optimize. It wouldn't be too hard to implement an IndexedList<T> : ICollection<T> that would allow you to index off of various properties.

Here is a naive implementation that could get you started:

class IndexedList<T> : IEnumerable<T> {
    List<T> _list;
    Dictionary<string, Dictionary<object, List<T>>> _dictionary;
    Dictionary<string, Func<T, object>> _propertyDictionary;

    public IndexedList(IEnumerable<string> propertyNames) : this(propertyNames, new List<T>()) { }

    public IndexedList(IEnumerable<string> propertyNames, IEnumerable<T> source) {
        _list = new List<T>();
        _dictionary = new Dictionary<string, Dictionary<object, List<T>>>();
        _propertyDictionary = BuildPropertyDictionary(propertyNames);
        foreach (var item in source) {
            Add(item);
        }
    }

    static Dictionary<string, Func<T, object>> BuildPropertyDictionary(IEnumerable<string> keys) {
        var propertyDictionary = new Dictionary<string,Func<T,object>>();
        foreach (string key in keys) {
            ParameterExpression parameter = Expression.Parameter(typeof(T), "parameter");
            Expression property = Expression.Property(parameter, key);
            Expression converted = Expression.Convert(property, typeof(object));
            Func<T, object> func = Expression.Lambda<Func<T, object>>(converted, parameter).Compile();
            propertyDictionary.Add(key, func);
        }
        return propertyDictionary;
    }

    public void Add(T item) {
        _list.Add(item);
        foreach (var kvp in _propertyDictionary) {
            object key = kvp.Value(item);
            Dictionary<object, List<T>> propertyIndex;
            if (!_dictionary.TryGetValue(kvp.Key, out propertyIndex)) {
                propertyIndex = new Dictionary<object, List<T>>();
                _dictionary.Add(kvp.Key, propertyIndex);
            }
            List<T> list;
            if (!propertyIndex.TryGetValue(key, out list)) {
                list = new List<T>();
                propertyIndex.Add(key, list);
            }
            propertyIndex[key].Add(item);
        }
    }

    public IEnumerable<T> GetByIndex<TIndex>(string propertyName, TIndex index) {
        return _dictionary[propertyName][index];
    }

    public IEnumerator<T> GetEnumerator() {
        return _list.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
    }
}

Usage:

List<Test> tests = new List<Test>() {
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "aaa", Value = 111, Valid = Valid.Yes },
            new Test { Name = "bbb", Value = 112, Valid = Valid.No },
            new Test { Name = "bbb", Value = 111, Valid = Valid.No },
            new Test { Name = "bbb", Value = 220, Valid = Valid.No },
            new Test { Name = "ccc", Value = 220, Valid = Valid.Yes }
};
// build an IndexedList<Text> indexed by Name and Value
IndexedList<Test> indexed = new IndexedList<Test>(new List<string>() { "Name", "Value" }, tests);
// lookup where Name == "bbb"
foreach (var result in indexed.GetByIndex("Name", "bbb")) {
    Console.WriteLine(result.Value);
}

But see, the reason you don't do this unless the naive implementation is not already fast enough is because of the additional complexity you just added to your system. You just added new code to maintain, new code to test and might not gain anything if this isn't faster on your real-world data or is not a bottleneck of your application.

like image 56
jason Avatar answered Sep 19 '22 12:09

jason