Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintaining multiple indexes with guava cache (in-memory table)

I'm trying to implement a simplified in-memory cached "table" where there are 2 types of indexes: primary and secondary.

  • Primary index maps a single key (primary key) to a unique value (Map interface)

  • Secondary index maps a single key to a Collection of values (Multimap fits the bill)

Very similar to a table in RDBMS world where one has several lookup columns. Sometimes you want to search by PK, sometimes return a list of rows based on a common property. Right now, there is no need in other operations than equals (=) (ie no range queries, or pattern matching).

Add cache semantics to the above data structure (eviction, data population/cache loader, refresh etc.) and that's pretty much what is needed.

I would like to ask your advice on how to best approach given problem. Should it be Cache per index or Cache (for PK) + (synchronized) Multimap for secondary indexes ?

Any help is much appreciated.

Regards.

like image 439
Andrei Avatar asked May 30 '12 00:05

Andrei


2 Answers

You can replace a Map with a Guava com.google.common.cache.Cache. It doesn't support Multimap type semantics , so you'd have to use

Cache<K, ? extends List<V>> 

in that case.

For the sake of simplicity I would make the 'primary index' a subset of the secondary index - i.e. you have a single index that returns a list of values for a given key and primary keys just return a list with a single value.

like image 91
vladimir e. Avatar answered Sep 22 '22 18:09

vladimir e.


The challenge here is to maintain the integrity of two indexes regardless of whether you use two cache or even one cache for PK + multimap.

May be you should create a new cache class (say TableCache) which extends com.google.common.cache.Cache, internally this class can maintain a multimap instance variable for the secondary index (which can be a ConcurrentHashMap).

Then you can override Cache methods (put, get, invalidate etc) to keep the secondary index in sync.

Of course, you have to provide a get function to retrieve values based on secondary index.

This approach gives you the ability to maintain the integrity of primary and secondary indexes.

public class TableCache<K, V> extends Cache<K, V> {

    Map<K, List<V>> secondaryIndex = new ConcurrentHashMap<K, List<V>>();

    public void put(K key, V value) {
        super.put(key, value);
        // Update secondaryIndex
    }

}
like image 28
sperumal Avatar answered Sep 22 '22 18:09

sperumal