Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent modification of a list while using copy constructor

Will the following code cause a ConcurrentModificationException or other side effects?

ArrayList<String> newList = new ArrayList<String>(list);

Considering that the size of list is very huge and another thread is concurrently modifying the list when above code is getting executed.

like image 836
Sushant Avatar asked Jun 14 '12 14:06

Sushant


2 Answers

Edit:

My initial response is yes but as @JohnVint correctly points out, it won't be a ConcurrentModificationException since under the covers ArrayList is duplicating the array using System.arrayCopy(...). See the code snippets at the end.

The problem is that another thread is making changes to the element array as you do this copy. You might get IndexOutOfBoundsException, uninitialized array values, or even some sort of native memory access exception since System.arraycopy(...) is done in native code.

You will need to synchronize on the list during both the update and the copy to protect against these race condition as well as establish the memory barrier to make sure the element array that backs the ArrayList is appropriately up-to-date.


public ArrayList(Collection<? extends E> c) {
    elementData = c.toArray();
    ...
}

// ArrayList
public Object[] toArray() {
        return Arrays.copyOf(elementData, size);
}

// Arrays
public static <T,U> T[] copyOf(U[] original, int newLength,
    Class<? extends T[]> newType) {
    ...
    System.arraycopy(original, 0, copy, 0,
        Math.min(original.length, newLength));
}

// System
public static native void arraycopy(Object src,  int  srcPos,
    Object dest, int destPos, int length);
like image 136
Gray Avatar answered Sep 22 '22 01:09

Gray


You need to think about what you are doing here. If list's class is not thread-safe, you could destroy list completely with this code, along with newList--a CME would be the least of your problems. (I'd suggest a class that doesn't throw CME's, but in this case a CME is a good thing.) Note also: this code is hard to test. You'll get somewhere between zero and a billion problem-free runs between every failure, and the failures might be very subtle, though they're more likely to be massive and beyond rational explanation.

The quickest fix is to lock list. You want to be sure to lock it everywhere it's used; you're not really locking the list, you're locking the block of code you're accessing it from. You have to lock all accesses. The downside is that you'll block the other thread while the new list is being created. This is really the way to go. However, if, as you say, "list is very huge", you may be concerned about performance, so I'll go on...

It can be worth doing this if newList is treated as immutable and you make frequent use of it once created. Lots of code can now read newList simultaneously without problems and without fear of inconsistencies. But there is still the hold-up with the initial creation.

The next step is to make list a java.util.ConcurrentLinkedQueue. (There's a concurrent map and set if you need something fancier.) This thing can have a bunch of threads reading it while a bunch more are added and deleting to it, and it always works. It may not contain what you think it contains, but an iterator won't go into an infinite loop (as might happen if list were a java.util.LinkedList). This lets newList get created on one core while your other thread works on another.

Downsides: if list was an ArrayList, you might find it a bit of work to switch over to a concurrent class. Concurrent classes use more memory and are usually slower than ArrayList. Much more important: the contents of list might be inconsistent. (Actually, you already have this problem.) You might add or remove entries A and B at the same time in the other thread, and expect that either both or neither will be in newList, when in fact it's quite easy for only one to be there, the iterator coming through after one's been added or removed but before the other has. (Single core machines don't have this problem as much.) But if list is already thought of as being in a constant, disordered flux, this may be just what you want.

Another, different, side effect: you have to be careful with large arrays and things that use them (like ArrayList and HashTable). They don't use less space when you remove entries, so you can end up with a bunch of large arrays with little data in them hogging most of your memory.

Worse, when you add entries, they free their old array and allocate a new, larger one, which results in fragmenting free memory. That is, free memory becomes mostly descarded chunks from old arrays, none of which is large enough to be used for the next allocation. The Garbage Collector will try to defragment all this, but it's a lot of work and the GC is inclined to throw out-of-memory exceptions rather than take the time to rearrange the free blocks so it can get the largest-yet memory block you just requested. So you get an out-of-memory error when only 10% of your memory is in use.

Arrays are the fastest thing going, but you need to use big ones with care. Be aware of each allocation and free. Give them a proper initial size so they won't reallocate space. (Pretend you're a C programmer.) Be kind to your GC. If you must create and free and resize big lists willy-nilly, consider using a linked class: LinkedList, TreeMap, ConcurrentLinkedQueue, etc. They only use little bits of memory, and GC's love them.

like image 33
RalphChapin Avatar answered Sep 20 '22 01:09

RalphChapin