Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ehcache - using a List<Integer> as the cache value

So here is the problem I am trying to solve - I have an Object with two integer fields that I want to cache

public class MyObject {
   int x;
   int y;
   ....
}

Now the field x is what I mainly match on - but there can be duplicates in which case I want to fall back on the second field (so that this.x=that.x and this.y=that.y). y can only be 25 distinct values. Now I know I could just combine the two as a String and use that as the cache key, but then I would have to try x+[25 possible values] to actually determine if it was not in the cache - making cache misses very expensive. I was thinking of trying to store a List<Integer> as the cache value for the field x and then if their was more then one, iterate down the list and look for a match on y.

Now if I use a ConcurrentList (or a Set if I care about duplicates - lets ignore that for now) will multiple threads be able to add to it and then put it back into the cache without race conditions? Is it possible that Ehcache might return two different List Objects to two threads and then when they add their new value to the list and attempt to put it back to the cache I could get undeterministic results? Do you see a better way of constructing this cache?

EDIT : I appreciate the answers below, but everyone seems to be missing the main point. Will this work? Could Ehcache actually return two different objects for the same cacheKey (say if the object was on disk during the call and it's serialized it twice, once for each call).

like image 212
Gandalf Avatar asked Nov 09 '10 21:11

Gandalf


2 Answers

It's absolutely possible that you get two different instances of your List (or of any Serializable)! Try this:

public static void main(final String[] args) throws Exception {
    final Cache cache = CacheManager.getInstance().getCache("smallCache");

    final List<String> list = new ArrayList<String>();
    cache.put(new Element("A", list));

    /* We put in a second element. Since maxElementsInMemory="1", this means
     * that "A" will be evicted from memory and written to disk. */
    cache.put(new Element("B", new ArrayList<String>())); 
    Thread.sleep(2000); // We need to wait a bit, until "A" is evicted.

    /* Imagine, the following happens in Thread 1: */
        final List<String> retrievedList1 =
                   (List<String>) cache.get("A").getValue();
        retrievedList1.add("From Thread 1");

    /* Meanwhile, someone puts something in the cache: */
        cache.put(new Element("C", new ArrayList<String>())); 

    Thread.sleep(2000); // Once again, we wait a bit, until "A" is evicted.

    /* Now the following happens in Thread 2: */
        final List<String> retrievedList2 =
                   (List<String>) cache.get("A").getValue();
        retrievedList2.add("From Thread 2");
        cache.put(new Element("A", retrievedList2));

    /* Meanwhile in Thread 1: */    
        cache.put(new Element("A", retrievedList1));

    /* Now let's see the result: */
    final List<String> resultingList =
                        (List<String>) cache.get("A").getValue();
    for (final String string : resultingList) {
        System.out.println(string);
    } /* Prints only "From Thread 1". "From Thread 2" is lost.
                 But try it with maxElementsInMemory="3", too!! */

    CacheManager.getInstance().shutdown();
}

I used the following in ehcache.xml:

<cache name="smallCache"
       maxElementsInMemory="1"
       eternal="true"
       overflowToDisk="true"
       diskPersistent="true"
       maxElementsOnDisk="200"
       memoryStoreEvictionPolicy="LRU"
       transactionalMode="off"
       >
</cache>

One solution may be to use Explicit Locking, which seems to be available for standalone (non-Terracotta) caches, too (since ehcache 2.1).

Another solution would be to only have one thread which can modify the List. If you have multiple threads which can modify it, and you don't use locking on the cache, then you can get exactly the undeterministic results you described!

like image 66
Chris Lercher Avatar answered Sep 20 '22 13:09

Chris Lercher


I have a different approach for you, which I just read in an article about geographic range searches.

Put two key-value pairs in the cache: One with only x as the key, and one with both x and y as the key. When you look in the cache, look for the x-and-y key first. If it's there, you found a perfect match. If it's not there, look for the x key and possibly find a match with different y value.

like image 36
Christian Semrau Avatar answered Sep 18 '22 13:09

Christian Semrau