I have a huge in-memory set (like ~100K records) of plain CLR objects of defined type. This Type has public property int Id {get; set;}. What is the best .NET structure to contain this huge set of data in to provide quick access to any item by its Id? More specifically, this set of data is supposed to be operated inside a loop to find an item by Id, so the search should be done as fast as possible. The search might look like this:
// Find by id
var entity = entities.First(e => e.Id == id)
IEnumerable based structures like collections and lists are going to go through every element of the data until seeking element is found. What are alternative ways? I believe there should be a way to make a search of sorted arrays by Id like an index search in databases.
Thanks
Results of testing: FYI: Dictionary is not just fast, it's just incomparable. My small test shown performance gain from around 3000+ ms (calling First() on IEnumerable) to 0 ([index] on Dictionary)!
I would go with a Dictionary<TKey, TValue>:
var index = new System.Collections.Generic.Dictionary<int, T>();
where T is the type of objects that you want to access.
This is implemented as a hash table, ie. looking up an item is done by computing the key's hash value (which is usually a very quick operation) and using that hash value as an index into a table. It's perhaps a bit of a over-simplification, but with a dictionary, it almost doesn't matter how many entries you've stored in your dictionary — access time should stay approximately constant.
To add an entry, do index.Add(entity.Id, entity);
To check whether an item is in the collection, index.ContainsKey(id).
To retrieve an item by ID, index[id].
Dictionary<TKey, TValue>, where TKey is int and TValue is YourEntity. 
Example
var dictionary = new Dictionary<TKey, TValue>();
dictionary.Add(obj1.Id, obj1); 
// continue 
Or if you have a collection of objects, you can create the dictionary using a query
var dictionary = list.ToDictionary(obj => obj.Id, obj => obj);
Note: key values must be unique. If you have a non-unique collection, filter duplicates first (perhaps by calling Distinct() before creating the dictionary. Alternately, if you're looping over the collection to create the dictionary manually, check the ContainsKey method before attempting an Add operation.
Generally in-memory seek is best done with the Dictionary:
System.Collections.Generic.Dictionary<TKey, TValue>
Optionally when your data set no longer fits in memory, one would use disk-based btree.
Based on the information given, a HashTable is probably going to be the fastest. The Dictionary<T> class is going to provide you the best trade off for ease of use vs. performance. If you truly need maximum performance I would try all of the following classes. Based on memory usage, insert speed, search speed, they all perform differently:
in addition to performance you may be concerned with multithreaded access. These two collections provide thread saftey:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With