Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding collections concurrency and Collections.synchronized*

I learned yesterday that I've been incorrectly using collections with concurrency for many, many years.

Whenever I create a collection that needs to be accessed by more than one thread I wrap it in one of the Collections.synchronized* methods. Then, whenever mutating the collection I also wrap it in a synchronized block (I don't know why I was doing this, I must have thought I read it somewhere).

However, after reading the API more closely, it seems you need the synchronized block when iterating the collection. From the API docs (for Map):

It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:

And here's a small example:

List<O> list = Collections.synchronizedList(new ArrayList<O>());
...
synchronized(list) {
   for(O o: list) { ... }
}

So, given this, I have two questions:

  • Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?

  • More importantly, what is this accomplishing? By putting the iteration in a synchronized block you are preventing multiple threads from iterating at the same time. But another thread could mutate the list while iterating so how does the synchronized block help there? Wouldn't mutating the list somewhere else screw with the iteration whether it's synchronized or not? What am I missing?

Thanks for the help!

like image 562
rjcarr Avatar asked Sep 19 '14 19:09

rjcarr


4 Answers

Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?

Iterating works with one element at a time. For the Iterator to be thread-safe, they'd need to make a copy of the collection. Failing that, any changes to the underlying Collection would affect how you iterate with unpredictable or undefined results.

More importantly, what is this accomplishing? By putting the iteration in a synchronized block you are preventing multiple threads from iterating at the same time. But another thread could mutate the list while iterating so how does the synchronized block help there? Wouldn't mutating the list somewhere else screw with the iteration whether it's synchronized or not? What am I missing?

The methods of the object returned by synchronizedList(List) work by synchronizing on the instance. So no other thread could be adding/removing from the same List while you are inside a synchronized block on the List.

like image 112
Sotirios Delimanolis Avatar answered Nov 09 '22 08:11

Sotirios Delimanolis


The basic case

All of the methods of the object returned by Collections.synchronizedList() are synchronized to the list object itself. Whenever a method is called from one thread, every other thread calling any method of it is blocked until the first call finishes.

So far so good.

Iterare necesse est

But that doesn't stop another thread from modifying the collection when you're between calls to next() on its Iterator. And if that happens, your code will fail with a ConcurrentModificationException. But if you do the iteration in a synchronized block too, and you synchronize on the same object (i.e. the list), this will stop other threads from calling any mutator methods on the list, they have to wait until your iterating thread releases the monitor for the list object. The key is that the mutator methods are synchronized to the same object as your iterator block, this is what's stopping them.

We're not out of the woods yet...

Note though that while the above guarantees basic integrity, it doesn't guarantee correct behaviour at all times. You might have other parts of your code that make assumptions which don't hold up in a multi-threaded environment:

List<Object> list = Collections.synchronizedList( ... );
...
if (!list.contains( "foo" )) {
   // there's nothing stopping another thread from adding "foo" here itself, resulting in two copies existing in the list
   list.add( "foo" );
}
...
synchronized( list ) { //this block guarantees that "foo" will only be added once
  if (!list.contains( "foo" )) {
     list.add( "foo" );
  }
}

Thread-safe Iterator?

As for the question about a thread-safe iterator, there is indeed a list implementation with it, it's called CopyOnWriteArrayList. It is incredibly useful but as indicated in the API doc, it is limited to a handful of use cases only, specifically when your list is only modified very rarely but iterated over so frequently (and by so many threads) that synchronizing iterations would cause a serious bottle-neck. If you use it inappropriately, it can vastly degrade the performance of your application, as each and every modification of the list creates an entire new copy.

like image 41
biziclop Avatar answered Nov 09 '22 08:11

biziclop


Synchronizing on the returned list is necessary, because internal operations synchronize on a mutex, and that mutex is this, i.e. the synchronized collection itself.

Here's some relevant code from Collections, constructors for SynchronizedCollection, the root of the synchronized collection hierarchy.

    SynchronizedCollection(Collection<E> c) {
        if (c==null)
            throw new NullPointerException();
        this.c = c;
        mutex = this;
    }

(There is another constructor that takes a mutex, used to initialize synchronized "view" collections from methods such as subList.)

If you synchronize on the synchronized list itself, then that does prevent another thread from mutating the list while you're iterating over it.

The imperative that you synchronize of the synchronized collection itself exists because if you synchronize on anything else, then what you have imagined could happen - another thread mutating the collection while you're iterating over it, because the objects locked are different.

like image 43
rgettman Avatar answered Nov 09 '22 08:11

rgettman


Sotirios Delimanolis answered your second question "What is this accomplishing?" effectively. I wanted to amplify his answer to your first question:

Why is this even necessary? The only explanation I can think of is they're using a default iterator instead of a managed thread-safe iterator, but they could have created a thread-safe iterator and fixed this mess, right?

There are several ways to approach making a "thread-safe" iterator. As is typical with software systems, there are multiple possibilities, and they offer different tradeoffs in terms of performance (liveness) and consistency. Off the top of my head I see three possibilities.

1. Lockout + Fail-fast

This is what's suggested by the API docs. If you lock the synchronized wrapper object while iterating it (and the rest of the code in the system written correctly, so that mutation method calls also all go through the synchronized wrapper object), the iteration is guaranteed to see a consistent view of the contents of the collection. Each element will be traversed exactly once. The downside, of course, is that other threads are prevented from modifying or even reading the collection while it's being iterated.

A variation of this would use a reader-writer lock to allow reads but not writes during iteration. However, the iteration itself can mutate the collection, so this would spoil consistency for readers. You'd have to write your own wrapper to do this.

The fail-fast comes into play if the lock isn't taken around the iteration and somebody else modifies the collection, or if the lock is taken and somebody violates the locking policy. In this case if the iteration detects that the collection has been mutated out from under it, it throws ConcurrentModificationException.

2. Copy-on-write

This is the strategy employed by CopyOnWriteArrayList among others. An iterator on such a collection does not require locking, it will always show consistent results during iterator, and it will never throw ConcurrentModificationException. However, writes will always copy the entire array, which can be expensive. Perhaps more importantly, the notion of consistency is altered. The contents of the collection might have changed while you were iterating it -- more precisely, while you were iterating a snapshot of its state some time in the past -- so any decisions you might make now are potentially out of date.

3. Weakly Consistent

This strategy is employed by ConcurrentLinkedDeque and similar collections. The specification contains the definition of weakly consistent. This approach also doesn't require any locking, and iteration will never throw ConcurrentModificationException. But the consistency properties are extremely weak. For example, you might attempt to copy the contents of a ConcurrentLinkedDeque by iterating over it and adding each element encountered to a newly created List. But other threads might be modifying the deque while you're iterating it. In particular, if a thread removes an element "behind" where you've already iterated, and then adds an element "ahead" of where you're iterating, the iteration will probably observe both the removed element and the added element. The copy will thus have a "snapshot" that never actually existed at any point in time. Ya gotta admit that's a pretty weak notion of consistency.

The bottom line is that there's no simple notion of making an iterator thread safe that would "fix this mess". There are several different ways -- possibly more than I've explained here -- and they all involve differing tradeoffs. It's unlikely that any one policy will "do the right thing" in all circumstances for all programs.

like image 29
Stuart Marks Avatar answered Nov 09 '22 06:11

Stuart Marks