Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Characteristics of a volatile hashmap

I am trying to get a firm handle on how a variable declared as

private volatile HashMap<Object, ArrayList<String>> data;

would behave in a multi-threaded environment.

What I understand is that volatile means get from main memory and not from the thread cache. That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value. (This is exactly what I want BTW.)

My question is when I retrieve the ArrayList<String> and add or remove strings to it in thread A while thread B is reading, what exactly is affected by the volatile keyword? The HashMap only or is the effect extended to the contents (K and V) of the HashMap as well? That is when thread B gets an ArrayList<String> that is currently being modified in thread A what is actually returned is the last value of ArrayList<String> that existed before the updated began.

Just to be clear, lets say the update is adding 2 strings. One string has already been added in thread A when thread B gets the array. Does thread B get the array as it was before the first string was added?

like image 639
BigMac66 Avatar asked Mar 06 '14 01:03

BigMac66


3 Answers

That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value

This is your source of confusion. What volatile does is make sure that reads and writes to that field are atomic - so no other threads could ever see a partially written value.

A non-atomic long field (which takes 2 memory addresses on a 32-bit machine) could be read incorrectly if a write operation was preempted after writing to the first address, and before writing to the second address.

Note that the atomicity of reads/writes to a field has nothing to do with updating the inner state of an HashMap. Updating the inner state of an HashMap entails multiple instructions, which are not atomic as a whole. That's why you'd use locks to synchronize access to the HashMap.

Also, since read/write operations on references are always atomic, even if the field is not marked as volatile, there is no difference between a volatile and a non-volatile HashMap, regarding atomicity. In that case, all volatile does is give you acquire-release semantics. This means that, even though the processor and the compiler are still allowed to slightly reorder your instructions, no instructions may ever be moved above a volatile read or below a volatile write.

like image 89
dcastro Avatar answered Sep 18 '22 13:09

dcastro


The volatile keyword here is only applicable to HashMap, not the data stored within it, in this case is ArrayList.

As stated in HashMap documentation:

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

 Map m = Collections.synchronizedMap(new HashMap(...));
like image 41
Wins Avatar answered Sep 18 '22 13:09

Wins


The volatile keywords neither affects operations on the HashMap (e.g. put, get) nor operations on the ArrayLists within the HashMap. The volatile keywords only affects reads and writes on this particular reference to the HashMap. Again, there can be further references to the same HashMap, which are no affected.

If you want to synchronise all operations on: - the reference - the HashMap - and the ArrayList, then use an additional Lock object for synchronisation as in the following code.

private final Object lock = new Object();
private Map<Object, List<String>> map = new HashMap<>();

// access reference
synchronized (lock) {
    map = new HashMap<>();
}

// access reference and HashMap
synchronized (lock) {
    return map.contains(42);
}

// access reference, HashMap and ArrayList
synchronized (lock) {
    map.get(42).add("foobar");
}

If the reference is not changed, you can use the HashMap for synchronization (instead of the Lock).

like image 30
nosid Avatar answered Sep 20 '22 13:09

nosid