Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java concurrency scenario -- do I need synchronization or not?

Here's the deal. I have a hash map containing data I call "program codes", it lives in an object, like so:

Class Metadata
{
    private HashMap validProgramCodes;
    public HashMap getValidProgramCodes() { return validProgramCodes; }
    public void setValidProgramCodes(HashMap h) { validProgramCodes = h; }
}

I have lots and lots of reader threads each of which will call getValidProgramCodes() once and then use that hashmap as a read-only resource.

So far so good. Here's where we get interesting.

I want to put in a timer which every so often generates a new list of valid program codes (never mind how), and calls setValidProgramCodes.

My theory -- which I need help to validate -- is that I can continue using the code as is, without putting in explicit synchronization. It goes like this: At the time that validProgramCodes are updated, the value of validProgramCodes is always good -- it is a pointer to either the new or the old hashmap. This is the assumption upon which everything hinges. A reader who has the old hashmap is okay; he can continue to use the old value, as it will not be garbage collected until he releases it. Each reader is transient; it will die soon and be replaced by a new one who will pick up the new value.

Does this hold water? My main goal is to avoid costly synchronization and blocking in the overwhelming majority of cases where no update is happening. We only update once per hour or so, and readers are constantly flickering in and out.

like image 752
storkman Avatar asked Nov 18 '08 22:11

storkman


People also ask

When should synchronization be used?

Synchronization is usually needed when you are sharing data between multiple invocations and there is a possibility that the data would be modified resulting in inconsistency. If the data is read-only then you dont need to synchronize. In the code snippet above, there is no data that is being shared.

How do you avoid concurrency issues in Java?

The simplest way to avoid problems with concurrency is to share only immutable data between threads. Immutable data is data which cannot be changed. To make a class immutable define the class and all its fields as final. Also ensure that no reference to fields escape during construction.

What is the difference between synchronization and concurrency?

The word synchronization generally means sharing data between multiple processors or threads, while concurrency refers to a measure of– or the art of improving– how effectively an application allows multiple jobs required by that application (e.g. serving web page requests from a web server) to run simultaneously.

What is the need of synchronization in Java?

We need to synchronize the shared resources to ensure that at a time only one thread is able to access the shared resource. If an Object is shared by multiple threads then there is need of synchronization in order to avoid the Object's state to be getting corrupted. Synchronization is needed when Object is mutable.


1 Answers

Use Volatile

Is this a case where one thread cares what another is doing? Then the JMM FAQ has the answer:

Most of the time, one thread doesn't care what the other is doing. But when it does, that's what synchronization is for.

In response to those who say that the OP's code is safe as-is, consider this: There is nothing in Java's memory model that guarantees that this field will be flushed to main memory when a new thread is started. Furthermore, a JVM is free to reorder operations as long as the changes aren't detectable within the thread.

Theoretically speaking, the reader threads are not guaranteed to see the "write" to validProgramCodes. In practice, they eventually will, but you can't be sure when.

I recommend declaring the validProgramCodes member as "volatile". The speed difference will be negligible, and it will guarantee the safety of your code now and in future, whatever JVM optimizations might be introduced.

Here's a concrete recommendation:

import java.util.Collections;

class Metadata {

    private volatile Map validProgramCodes = Collections.emptyMap();

    public Map getValidProgramCodes() { 
      return validProgramCodes; 
    }

    public void setValidProgramCodes(Map h) { 
      if (h == null)
        throw new NullPointerException("validProgramCodes == null");
      validProgramCodes = Collections.unmodifiableMap(new HashMap(h));
    }

}

Immutability

In addition to wrapping it with unmodifiableMap, I'm copying the map (new HashMap(h)). This makes a snapshot that won't change even if the caller of setter continues to update the map "h". For example, they might clear the map and add fresh entries.

Depend on Interfaces

On a stylistic note, it's often better to declare APIs with abstract types like List and Map, rather than a concrete types like ArrayList and HashMap. This gives flexibility in the future if concrete types need to change (as I did here).

Caching

The result of assigning "h" to "validProgramCodes" may simply be a write to the processor's cache. Even when a new thread starts, "h" will not be visible to a new thread unless it has been flushed to shared memory. A good runtime will avoid flushing unless it's necessary, and using volatile is one way to indicate that it's necessary.

Reordering

Assume the following code:

HashMap codes = new HashMap();
codes.putAll(source);
meta.setValidProgramCodes(codes);

If setValidCodes is simply the OP's validProgramCodes = h;, the compiler is free to reorder the code something like this:

 1: meta.validProgramCodes = codes = new HashMap();
 2: codes.putAll(source);

Suppose after execution of writer line 1, a reader thread starts running this code:

 1: Map codes = meta.getValidProgramCodes();
 2: Iterator i = codes.entrySet().iterator();
 3: while (i.hasNext()) {
 4:   Map.Entry e = (Map.Entry) i.next();
 5:   // Do something with e.
 6: }

Now suppose that the writer thread calls "putAll" on the map between the reader's line 2 and line 3. The map underlying the Iterator has experienced a concurrent modification, and throws a runtime exception—a devilishly intermittent, seemingly inexplicable runtime exception that was never produced during testing.

Concurrent Programming

Any time you have one thread that cares what another thread is doing, you must have some sort of memory barrier to ensure that actions of one thread are visible to the other. If an event in one thread must happen before an event in another thread, you must indicate that explicitly. There are no guarantees otherwise. In practice, this means volatile or synchronized.

Don't skimp. It doesn't matter how fast an incorrect program fails to do its job. The examples shown here are simple and contrived, but rest assured, they illustrate real-world concurrency bugs that are incredibly difficult to identify and resolve due to their unpredictability and platform-sensitivity.

Additional Resources

  • The Java Language Specification - 17 Threads and Locks sections: §17.3 and §17.4
  • The JMM FAQ
  • Doug Lea's concurrency books
like image 65
erickson Avatar answered Oct 17 '22 20:10

erickson