The question is rather easy, in a way. Suppose I have this class:
static class Singleton {
}
And I want to provide a singleton factory for it. I can do the (probably) obvious. I am not going to mention the enum possibility or any other, as they are of no interest to me.
static final class SingletonFactory {
private static volatile Singleton singleton;
public static Singleton getSingleton() {
if (singleton == null) { // volatile read
synchronized (SingletonFactory.class) {
if (singleton == null) { // volatile read
singleton = new Singleton(); // volatile write
}
}
}
return singleton; // volatile read
}
}
I can get away from one volatile read
with the price of higher code complexity:
public static Singleton improvedGetSingleton() {
Singleton local = singleton; // volatile read
if (local == null) {
synchronized (SingletonFactory.class) {
local = singleton; // volatile read
if (local == null) {
local = new Singleton();
singleton = local; // volatile write
}
}
}
return local; // NON volatile read
}
This is pretty much what our code has been using for close to a decade now.
The question is can I make this even faster with release/acquire
semantics added in java-9
via VarHandle
:
static final class SingletonFactory {
private static final SingletonFactory FACTORY = new SingletonFactory();
private Singleton singleton;
private static final VarHandle VAR_HANDLE;
static {
try {
VAR_HANDLE = MethodHandles.lookup().findVarHandle(SingletonFactory.class, "singleton", Singleton.class);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private static Singleton getInnerSingleton() {
Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire
if (localSingleton == null) {
synchronized (SingletonFactory.class) {
localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire
if (localSingleton == null) {
localSingleton = new Singleton();
VAR_HANDLE.setRelease(FACTORY, localSingleton); // release
}
}
}
return localSingleton;
}
}
Would this be a valid and correct implementation?
Yes, this is correct, and it is present on Wikipedia. (It doesn't matter that the field is volatile, since it is only ever accessed from VarHandle
.)
If the first read sees a stale value, it enters the synchronized block. Since synchronized blocks involve happen-before relationships, the second read will always see the written value. Even on Wikipedia it says sequential consistency is lost, but it refers to the fields; synchronized blocks are sequentially consistent, even though they use release-acquire semantics.
So the second null check will never succeed, and the object is never instantiated twice.
It is guaranteed that the second read will see the written value, because it is executed with the same lock held as when the value was computed and stored in the variable.
On x86 all loads have acquire semantics, so the only overhead would be the null check. Release-acquire allows values to be seen eventually (that's why the relevant method was called lazySet
before Java 9, and its Javadoc used that exact same word). This is prevented in this scenario by the synchronized block.
Instructions may not be reordered out and into synchronized blocks.
I am going to try and answer this myself... TL;DR : This is a correct implementation, but potentially more expensive than the one with volatile?.
Though this looks better, it can under-perform in some case. I am going to push myself against the famous IRIW example
: independent reads of independent writes:
volatile x, y
-----------------------------------------------------
x = 1 | y = 1 | int r1 = x | int r3 = y
| | int r2 = y | int r4 = x
This reads as :
ThreadA
and ThreadB
) that write to x
and y
(x = 1
and y = 1
)ThreadC
and ThreadD
) that read x
and y
, but in reverse order.Because x
and y
are volatile
a result as below is impossible:
r1 = 1 (x) r3 = 1 (y)
r2 = 0 (y) r4 = 0 (x)
This is what sequential consistency
of volatile
guarantees. If ThreadC
observed the write to x
(it saw that x = 1
), it means that ThreadD
MUST observe the same x = 1
. This is because in a sequential consistent execution writes happens as-if in global order, or it happens as-if atomically, everywhere. So every single thread must see the same value. So this execution is impossible, according to the JLS too:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
Now if we move the same example to release/acquire
(x = 1
and y = 1
are releases while the other reads are acquires):
non-volatile x, y
-----------------------------------------------------
x = 1 | y = 1 | int r1 = x | int r3 = y
| | int r2 = y | int r4 = x
A result like:
r1 = 1 (x) r3 = 1 (y)
r2 = 0 (y) r4 = 0 (x)
is possible and allowed. This breaks sequential consistency
and this is normal, since release/acquire
is "weaker". For x86
release/acquire does not impose a StoreLoad
barrier , so an acquire
is allowed to go above (reorder) an release
(unlike volatile
which prohibits this). In simpler words volatile
s themselves are not allowed to be re-ordered, while a chain like:
release ... // (STORE)
acquire ... // this acquire (LOAD) can float ABOVE the release
is allowed to be "inverted" (reordered), since StoreLoad
is not mandatory.
Though this is somehow wrong and irrelevant, because JLS
does not explain things with barriers. Unfortunately, these are not yet documented in the JLS either...
If I extrapolate this to the example of SingletonFactory
, it means that after a release :
VAR_HANDLE.setRelease(FACTORY, localSingleton);
any other thread that does an acquire
:
Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY);
is not guaranteed to read the value from the release (a non-null Singleton
).
Think about it: in case of volatile
, if one thread has seen the volatile write, every other thread will, for sure, see it too. There is no such guarantee with release/acquire
.
As such, with release/acquire
every thread might need to enter the synchronized block. And this might happen for many threads, because it's really unknown when the store that happened in the release
will be visible by the load acquire
.
And even if the synchronized
itself does offer happens-before order, this code, at least for some time (until the release is observed) is going to perform worse? (I assume so): every thread competing to enter the synchronized block.
So in the end - this is about what is more expensive? A volatile store
or an eventually seen release
. I have no answer to this one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With