Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashMap<String,Value>.remove() synchronized by using String.intern() on the Key, does this even work? Or is this broken code?

I recently came across the following construct

Map<String,Value> map = new HashMap<>();
...
Value getValue(String key) {
    synchronized (key.intern()) {
        return map.remove(key);
    }
}

Given that intern() is usually not that fast, I doubt this would outperform using synchronized, Collections.synchronizedMap(Map) or ConcurrentHashMap. But even if this construct would be faster then all other methods in this particular case: Is this properly synchronized? I doubt that this is thread safe, as the remove could happen while the hash table is reorganized. But even if this would work, I suspect that code to be broken given that HashMap javadoc states:

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

like image 599
Flow Avatar asked Mar 17 '23 16:03

Flow


1 Answers

This is not sufficient to safely access a HashMap from multiple threads. In fact, it's all but guaranteed to break something. By synchronizing on a given key the map can still be unsafely modified concurrently, as long as separate threads are working with different keys.

Consider if these three threads were trying to run at the same time:

Thread 1                Thread 2                 Thread 3
synchronized("a") {     synchronized("a") {      synchronized("b") {
  map.remove("a");        map.remove("a");         map.remove("b");
}                       }                        }

Threads 1 and 2 would correctly wait for each other since they're synchronizing on the same object (Java interns string constants). But Thread 3 is unimpeded by the work going on in the other threads, and immediately enters its synchronized block since no one else is locking "b". Now two different synchronized blocks are interacting with map simultaneously, and all bets are off. Before long, your HashMap will be corrupt.

Collections.synchronizedMap() correctly uses the map itself as the synchronizing object and therefore locks the whole map, not just the keys being used. This is the only robust way to prevent internal corruption of a HashMap being accessed from multiple threads.

ConcurrentHashMap correctly does what I think the code you posted is trying to do by locking internally on subsets of all the keys in the map. That way, multiple threads can safely access different keys on different threads and never block each other - but if the keys happen to be in the same bucket, the map will still block. You can modify this behavior with the concurrencyLevel constructor argument.

See also: Java synchronized block vs. Collections.synchronizedMap


As an aside, lets suppose for the sake of argument that synchronized(key.intern()) was a reasonable way to concurrently access a HashMap. This would still be incredibly error prone. If just a single place in the application failed to call .intern() on a key, everything could come crashing down.

like image 191
dimo414 Avatar answered Mar 20 '23 06:03

dimo414