Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded usage of `ConcurrentHashMap` iterators

I need to write a somewhat specific implementation of a cache, which has unique keys but can contain duplicate values, e.g.:

 "/path/to/one" -> 1
 "/path/to/two" -> 2
 "/path/to/vienas" -> 1
 "/path/to/du" -> 2

The class needs to offer non blocking reads/key lookups but also has typical create/update/delete mutators. For example, removing value 2 should result

"/path/to/one" -> 1
"/path/to/vienas" -> 1

Reads for this cache will outweigh the writes by far so write performance is not an issue - as long as concurrent writes do not run on top of each other. The overall number of entries is much likely going to be less than 1000 so occasional iterating over values is still affordable.

So I wrote something like this (pseudo code):

//
// tl;dr all writes are synchronized on a single lock and each
// resets the reference to the volatile immutable map after finishing
//
class CopyOnWriteCache {
   private volatile Map<K, V> readOnlyMap = ImmutableMap.of();

   private final Object writeLock = new Object();

   public void add(CacheEntry entry) {
      synchronized (writeLock) {
         readOnlyMap = new ImmutableMap.Builder<K, V>()
            .addAll(readOnlyMap)
            .add(entry.key, entry.value)
            .build();
      }
   }

   public void remove(CacheEntry entry) {
      synchronized (writeLock) {
         Map<K, V> filtered = Maps.filterValues(readOnlyMap, somePredicate(entry));
         readOnlyMap = ImmutableMap.copyOf(filtered);
      }
   }

   public void update(CacheEntry entry) {
      synchronized (writeLock) {
         Map<K, V> filtered = Maps.filterValues(readOnlyMap, somePredicate(entry));
         readOnlyMap = new ImmutableMap.Builder<K, V>()
             .addAll(filtered)
             .add(entry.key, entry.value)
             .build();
      }
   }

   public SomeValue lookup(K key) {
      return readOnlyMap.get(key);
   }
}

After writing the above, I realized that ConcurrentHashMap also offers non-blocking reads which would make all my effort pointless, but there's a statement in its Javadoc which raises an eyebrow:

iterators are designed to be used by only one thread at a time

So if I replace the usage of volatile ImmutableMap with final ConcurrentHashMap and remove all synchronized blocks, is it possible that competing concurrent mutators will invalidate each other? For example, I can imagine how two concurrent calls to remove would lead to a race condition, completely invalidating the results of the first remove.

The only improvement I can see is that by using final ConcurrentHashMap and leaving synchronized as they are I could at least avoid unnecessary copying of data.

Does that make sense - or maybe I'm overlooking something here? Can anyone suggest other alternatives for this solution?

like image 359
mindas Avatar asked Jan 19 '12 22:01

mindas


2 Answers

If you made this replacement, you would still have only one thread using a given Iterator at once. The warning means that two threads should not use the same Iterator instance. Not that two threads can't iterate concurrently.

The problem you would have is that since the removal operation can't be done in a single atomic operation of the ConcurrentMap, you could have a concurrent thread see the map in an intermediate state: one value has been removed but not the other one.

I'm not sure this would be any faster, since you say that write performance is not an issue, but what you could do to avoid the copy of the map at each write, is to use a ReadWriteLock guarding a mutable ConcurrentMap. All the reads would still be concurrent, but a write to the map would prevent all the other threads to access the map. And you wouldn't have to create a fresh copy each time it's modified.

like image 115
JB Nizet Avatar answered Sep 18 '22 01:09

JB Nizet


It's fine to mutate ConcurrentHashMap through multiple iterators/multiple threads at the same time. It's just that you are not supposed to hand a single instance of iterator to multiple threads and use it concurrently.

So, if you use ConcurrentHashMap you don't need to leave synchronized in there. As JB Nizet notes, difference between this and your current implementation is the visibility of the intermediate state. If you don't care about that, using ConcurrentHashMap would be my preferred choice as the implementation is most simple and you won't have to worry about read or write performance.

like image 41
Enno Shioji Avatar answered Sep 20 '22 01:09

Enno Shioji