Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of atomic reference in Effective Jave example

In Effective Java - Item 74 Joshua Bloch demonstrates safe use of parameterless constructor with separate initialization method in following code snippet.

abstract class AbstractFoo {
            private int x, y; // Our state
                    // This enum and field are used to track initialization

            private enum State {
                NEW, INITIALIZING, INITIALIZED
            };

            private final AtomicReference<State> init = new AtomicReference<State>(
                    State.NEW);

            public AbstractFoo(int x, int y) {
                initialize(x, y);
            }

            // This constructor and the following method allow
            // subclass's readObject method to initialize our state.
            protected AbstractFoo() {
            }

            protected final void initialize(int x, int y) {
                if (!init.compareAndSet(State.NEW, State.INITIALIZING))
                    throw new IllegalStateException("Already initialized");
                this.x = x;
                this.y = y;
                // ...Do anything else the original constructor did
                init.set(State.INITIALIZED);
            }

            // These methods provide access to internal state so it can
            // be manually serialized by subclass's writeObject method.
            protected final int getX() {
                checkInit();
                return x;
            }

            protected final int getY() {
                checkInit();
                return y;
            }

            // Must call from all public and protected instance methods
            private void checkInit() {
                if (init.get() != State.INITIALIZED)
                    throw new IllegalStateException("Uninitialized");
            }

        }

What puzzles me is use of AtomicReference. His explanation sounds:

Note that the initialized field is an atomic reference (java.util.concurrent.atomic.AtomicReference). This is necessary to ensure object integrity in the face of a determined adversary. In the absence of this precaution, if one thread were to invoke initialize on an instance while a second thread attempted to use it, the second thread might see the instance in an inconsistent state.

I fail to understand how this strengthens the object safety against using it in inconsistent state. In my understanding, if one threads runs initialize() and the second one runs any of accessors, there cannot be a situation when the second would read the value of the x or y field without initialization being marked as completed.

Other possible issue I might see here is that AtomicReference should be threadsafe (probably with volatile field inside). This would ensure immediate synchronization of value change in the init variable with other threads which would prevent getting IllegalStateException when in fact the initialization has been done but the thread executing accessor methods cannot see it. But is this the thing the author is talking about?

Is my reasoning correct? Or is there other explanation to this?

like image 647
ps-aux Avatar asked Mar 12 '14 21:03

ps-aux


1 Answers

This is a long answer, and it sounds like you already have some grasp of the issue, so I'm adding headers to try and make it easier for you to fast-forward past the parts you already know.

The problem

Multithreading is a bit tricky, and one of the trickier bits is that the compiler/JVM is allowed to reorder operations across threads in the absence of synchronization. That is, if thread A does:

field1 = "hello";
field2 = "world";

and thread B does:

System.out.println(field2);
System.out.println(field1);

Then it's possible that thread B would print out "world" followed by "null" (assuming that's what field1 was initially). This "shouldn't" happen, because you set field2 after field1 in the code — so if field2 has been set, then surely field1 must be, too? Nope! The compiler is allowed to reorder things so that thread 2 sees the assignments as happening like this:

field2 = "world";
field1 = "hello";

(It could even see field2 = "world" and never see field1 = "hello", or it could never see either assignment, or other possibilities.) There are various reasons why this could happen: it might be more efficient due to how the compiler wants to use registers, or it could be that it's a more efficient way to share memory across CPU cores. Point is, it's allowed.

... even with constructors

One of the more un-intuitive concepts here is that a constructor generally doesn't provide any special guarantees for reordering (except, it does for final fields). So don't think of the constructor as anything other than a method, and don't think of a method as anything other than a grouping of actions, and don't think of an object's state as anything other than a grouping of fields. It seems obvious that an assignment in a constructor would be seen by anyone who has that object (after all, how can you read an object's state before you finished making the object?), but that notion is incorrect due to reorderings. What you think of as foo = new ConcreteFoo() is actually:

  • allocate memory for a new ConcreteFoo (call it this); call initalize, do some stuff...
  • this.x = x
  • this.y = y
  • foo = <the newly constructed object>

You can see how the bottom three assignments could be reordered; thread B could see them as happening in various ways, including (but not limited to):

  • foo = <the newly constructed object, with default values for all fields>
  • foo.getX() which returns 0
  • this.x = x (possible a long time later)
  • (this.y = y is never seen by thread B)

Happens-before relationships

However, there are ways to solve that problem. Let's put the AtomicReference to the side for a moment...

The way to solve the problem is with a a happens-before (HB) relationship. If there is a HB relationship between the writes and the reads, then the CPU is not allowed to do the reordering above.

Specifically:

  • if thread A does action A
  • and thread B does action B
  • and action A happens-before action B
  • then when thread B does action B, it must see at least all of the actions that thread A saw as of action A. In other words, thread B sees the world at least as "up to date" as thread A saw it.

That's pretty abstract, so let me make it more concrete. One way you can establish a happens-before edge is with a volatile field: there's a HB relationship between one thread writing to that field and another thread reading from it. So, if thread A writes to a volatile field, and thread B reads from that same field, then thread B must see the world as thread A saw it at the time of the write (well, at least as recently as that: thread B could also see some subsequent actions).

So, let's say field2 were volatile. In that case:

Thread 1:
field1 = "hello";
field2 = "world"; // point 1

Thread 2:
System.out.println(field2); // point 2
System.out.println(field1); // point 3

Here, point 1 "starts" a HB relationship that point 2 "finishes." That means that as of point 2, thread 2 must see everything that thread 1 saw at point 1 — specifically, the assignment field1 = "hello" (as well as field2 = "world"). And so, thread 2 will print out "world\nhello" as expected.

AtomicReferences

So, what does all this have to do with AtomicReference? The secret lies in the javadoc for the java.util.concurrent.atomic package:

The memory effects for accesses and updates of atomics generally follow the rules for volatiles, as stated in section 17.4 of The Java™ Language Specification.

In other words, there is a HB relationship between myAtomicRef.set and myAtomicRef.get. Or, as in the example above, between myAtomicRef.compareAndSet and myAtomicRef.get.

Back to AbstractFoo

Without the AtomicReference actions, there are no HB relationships established in AbstractFoo. If one thread assigns a value to this.x (as it does in initialize, called by the constructor) and another thread reads the value this.x (as it does during getX), you could have the reordering problem mentioned above, and have getX return the default value for x (that is, 0).

But AbstractFoo does take specific measures to establish HB relationships: initialize also calls init.set after it assigns this.x = x, and getX calls init.get (via checkInit) before it reads this.x to return it (similarly with y). That establishes the HB relationship, ensuring that thread 2 calling getX, by the time it reads this.x, sees the world as thread A saw it at the end of initialize, when it called init.set; specifically, thread 2 sees the action this.x = x before it performs the action return [this.]x.

Further reading

There are a few other ways to establish happens-before edges, but that's out of scope for this answer. They're listed in JLS 17.4.4.

And the obligatory reference to JCIP, a great book for multithreading issues in general, and their applicability to Java in particular.

like image 81
yshavit Avatar answered Oct 27 '22 00:10

yshavit