Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In-memory indexes

I have the concept of a Session which stores objects in various states.

Sometimes I need to scan the Session for objects matching a particular query but I do this a lot and performance testing has shown it is becoming a bottleneck in some areas.

Therefore I would like to introduce the concept of indexes on a Session.

Something like...

public IDictionary<K, V> GetIndex<K, V>(Func<V, K> keySelector)

However I'm not sure about how to test "equality" of a Func like this. Obviously I want the index to only be built on the first call to GetIndex and subsequent calls to not build it again.

How should I be mapping these internally to do index existence lookups?

IDictionary<???, IDictionary<K, V>> indexes = ...

Basically how should I be storing the ???. Maybe I can't do this using a Func but perhaps there is some other way.

like image 632
Mike Q Avatar asked May 19 '11 16:05

Mike Q


People also ask

What is an in memory index?

Indexers automatically maintain in-memory overviews of resources (indices), grouped by keys that are usually calculated based on these resources.

Are indexes stored in memory?

An index is usually maintained as a B+ Tree on disk & in-memory, and any index is stored in blocks on disk. These blocks are called index blocks. The entries in the index block are always sorted on the index/search key.

How do I create a memory index?

To configure the index create memory optionClick the Memory node. Under Index creation memory, type or select the desired value for the index create memory option. Use the index create memory option to control the amount of memory used by index creation sorts.


1 Answers

The simplest approach is probably to compute a hash of the query, and insert the results into your dictionary using the hash as the key.

If your queries are strings, you can probably just use the string.GetHashCode function to compute a simple hash on the string data. If your queries are Linq queries, .GetHashCode probably won't work unless Linq specifically overrides this method to compute a hash over the expression tree instead of the default object instance pointer. The default implementation of .GetHashCode simply returns a value that is derived from the object instance identity in memory, with no consideration of the data content of the object.

If your queries are strings and are fairly uniform/consistent in construction, computing a simple string hash should be sufficient for reducing query traffic using the cache. If your queries are less consistent in structure (equivalent queries but with arguments in a different order, for example) you may need to build your own hash function that computes a hash on a canonicalized form of the input query to improve cache hit rates for queries that are logically equivalent but textually different.

As your hash computation grows more computationally expensive, it will diminish the performance gains of using a cache. Make sure the query operation is sufficiently expensive to justify spending time computing hashes and consuming memory for the cache to produce a net savings in execution time. The query operation should be at least 2 or more orders of magnitude greater than the hash calc and cache management overhead. If your query operation is an out of process or cross-network call, your cache overhead will almost certainly be dwarfed by the cost of the query.

like image 124
dthorpe Avatar answered Sep 24 '22 13:09

dthorpe