Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing an acquire for a release from Unsafe.putOrdered*()?

What do you think is the best correct way for implementing the acquire part of a release/acquire pair in Java?

I'm trying to model some of the actions in an application of mine using classic release/acquire semantics (without StoreLoad and without sequential consistency across threads).

There are a couple of ways to achieve the rough equivalent of a store-release in the JDK. java.util.concurrent.Atomic*.lazySet() and the underlying sun.misc.Unsafe.putOrdered*() are the most often cited approaches to do that. However there's no obvious way to implement a load-acquire.

  • The JDK APIs which allow lazySet() mostly use volatile variables internally, so their store-releases are paired with volatile loads. In theory volatile loads should be more expensive than load-acquires, and should not provide anything more than a pure load-acquire in the context of a preceding store-release.

  • sun.misc.Unsafe does not provide getAcquire()* equivalents of the putOrdered*() methods, even though such acquire methods are planned for the upcoming VarHandles API.

  • Something that sounds like it would work is a plain load, followed by sun.misc.Unsafe.loadFence(). It's somewhat disconcerting that I haven't seen this anywhere else. This may be related to the fact that it's a pretty ugly hack.

P.S. I understand well that these mechanisms are not covered by the JMM, that they are not sufficient for maintaining sequential consistency, and that the actions they create are not synchronization actions (e.g. I understand that they for example break IRIW). I also understand that the store-releases provided by Atomic*/Unsafe are most often used either for eagerly nulling out references or in producer/consumer scenarios, as an optimized message passing mechanism for some important index.

like image 651
Dimitar Dimitrov Avatar asked May 08 '16 22:05

Dimitar Dimitrov


2 Answers

Volatile read is exactly what you are looking for.

In fact, corresponding volatile operations already have release/acquire semantics (otherwise happens-before is not possible for paired volatile write-read), but paired volatile operations should not only be sequentially consistent (~happens-before), but also they should be in total synchronization order, thats why StoreLoad barrier is inserted after volatile write: to guarantee total order of volatile writes to different locations, so all threads will see those values in the same order.

Volatile read has acquire semantics: proof from hotspot codebase, also there is direct recommendation by Doug Lea in JSR-133 cookbook (LoadLoad and LoadStore barriers after each volatile read).

Unsafe.loadFence() also has acquire semantics (proof), but used not to read value (you can do the same with plain volatile read), but to prevent reorder plain reads with subsequent volatile read. This is used in StampedLock for optimistic reading (see StampedLock#validate method implementation and usages).

Update after discussion in comments.

Let's check if Unsafe#loadStore() and volatile read are the same and have acquire semantics.

I'm looking at hotspot C1 compiler source code to avoid reading through all the optimizations in C2. It transforms bytecode (in fact, not bytecode, but its interpreter representation) into LIR (Low-Level Intermediate Representation) and then translates graph to actual opcodes depends on target microarchitecture.

Unsafe#loadFence is intrinsic which has _loadFence alias. In C1 LIR generator it generates this:

case vmIntrinsics::_loadFence :
if (os::is_MP()) __ membar_acquire();

where __ is macros for LIR generation.

Now let's look at volatile read implementation in the same LIR generator. It tries to insert null checks, checks IRIW, checks if we are on x32 and trying to read 64-bit value (to make some magic with SSE/FPU) and, finally, leads us to the same code:

if (is_volatile && os::is_MP()) {
    __ membar_acquire();
}

Assembler generator then inserts platform-specific acquire instruction(s) here.

Looking at specific implementations (no links here, but all can be found in src/cpu/{$cpu_model}/vm/c1_LIRAssembler_{$cpu_model}.cpp)

  • SPARC

    void LIR_Assembler::membar_acquire() {
        // no-op on TSO
    }
    
  • x86

    void LIR_Assembler::membar_acquire() {
        // No x86 machines currently require load fences
    }
    
  • Aarch64 (weak memory model, barriers should be present)

    void LIR_Assembler::membar_acquire() {
        __ membar(Assembler::LoadLoad|Assembler::LoadStore);
    }
    

    According to aarch architecture description such membar will be compiled as dmb ishld instruction after load.

  • PowerPC (also weak memory model)

    void LIR_Assembler::membar_acquire() {
        __ acquire();
    }
    

    which then transforms into specific PowerPC instruction lwsync. According to the comments lwsync is semantically equivalent to

    lwsync orders Store|Store, Load|Store, Load|Load, but not Store|Load

    But as long as PowerPC hasn't any weaker barriers, this is the only choice to implement acquire semantics on PowerPC.

Conclusions

Volatile reads and Unsafe#loadFence() are equal in terms of memory ordering (but maybe not in terms of possible compiler optimizations), on most popular x86 it's no-op, and PowerPC is the only supported architecture with has no precise acquire barriers.

like image 92
qwwdfsad Avatar answered Oct 14 '22 17:10

qwwdfsad


Depending on your exact requirements, doing a non-volatile load, possibly followed by a possible volatile load is the best you can get in Java.

You can do this with a combination of

int permits = theUnsafe.getInt(object, offset);
if (!enough(permits))
    permits = theUnsafe.getVolatileInt(object, offset);

This pattern can be used in ring buffers to minimise churn of cache lines.

like image 30
Peter Lawrey Avatar answered Oct 14 '22 16:10

Peter Lawrey