This question is not about the difference between them - I know what spurious failure is and why it happens on LL/SC. My question is if I'm on intel x86 and using java-9 (build 149), why is there a difference between their assembly code?
public class WeakVsNonWeak {
static jdk.internal.misc.Unsafe UNSAFE = jdk.internal.misc.Unsafe.getUnsafe();
public static void main(String[] args) throws NoSuchFieldException, SecurityException {
Holder h = new Holder();
h.setValue(33);
Class<?> holderClass = Holder.class;
long valueOffset = UNSAFE.objectFieldOffset(holderClass.getDeclaredField("value"));
int result = 0;
for (int i = 0; i < 30_000; ++i) {
result = strong(h, valueOffset);
}
System.out.println(result);
}
private static int strong(Holder h, long offset) {
int sum = 0;
for (int i = 33; i < 11_000; ++i) {
boolean result = UNSAFE.weakCompareAndSwapInt(h, offset, i, i + 1);
if (!result) {
sum++;
}
}
return sum;
}
public static class Holder {
private int value;
public int getValue() {
return value;
}
public void setValue(int value) {
this.value = value;
}
}
}
Running with:
java -XX:-TieredCompilation
-XX:CICompilerCount=1
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintIntrinsics
-XX:+PrintAssembly
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED
WeakVsNonWeak
Output of compareAndSwapInt(relevant parts):
0x0000000109f0f4b8: movabs $0x111927c18,%rsi ; {metadata({method} {0x0000000111927c18} 'compareAndSwapInt' '(Ljava/lang/Object;JII)Z' in 'jdk/internal/misc/Unsafe')}
0x0000000109f0f4c2: mov %r15,%rdi
0x0000000109f0f4c5: test $0xf,%esp
0x0000000109f0f4cb: je 0x0000000109f0f4e3
0x0000000109f0f4d1: sub $0x8,%rsp
0x0000000109f0f4d5: callq 0x00000001098569d2 ; {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
0x0000000109f0f4da: add $0x8,%rsp
0x0000000109f0f4de: jmpq 0x0000000109f0f4e8
0x0000000109f0f4e3: callq 0x00000001098569d2 ; {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
0x0000000109f0f4e8: pop %r9
0x0000000109f0f4ea: pop %r8
0x0000000109f0f4ec: pop %rcx
0x0000000109f0f4ed: pop %rdx
0x0000000109f0f4ee: pop %rsi
0x0000000109f0f4ef: lea 0x210(%r15),%rdi
0x0000000109f0f4f6: movl $0x4,0x288(%r15)
0x0000000109f0f501: callq 0x00000001098fee40 ; {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
0x0000000109f0f506: vzeroupper
0x0000000109f0f509: and $0xff,%eax
0x0000000109f0f50f: setne %al
0x0000000109f0f512: movl $0x5,0x288(%r15)
0x0000000109f0f51d: lock addl $0x0,-0x40(%rsp)
0x0000000109f0f523: cmpl $0x0,-0x3f04dd(%rip) # 0x0000000109b1f050
Output of weakCompareAndSwapInt:
0x000000010b698840: sub $0x18,%rsp
0x0000010b698847: mov %rbp,0x10(%rsp)
0x000000010b69884c: mov %r8d,%eax
0x000000010b69884f: lock cmpxchg %r9d,(%rdx,%rcx,1)
0x000000010b698855: sete %r11b
0x000000010b698859: movzbl %r11b,%r11d ;*invokevirtual compareAndSwapInt {reexecute=0 rethrow=0 return_oop=0}
; - jdk.internal.misc.Unsafe::weakCompareAndSwapInt@7 (line 1369)
I am by far not versatile enough to understand the entire output, but can definitely see the difference between lock addl and lock cmpxchg.
EDIT Peter's answer got me thinking. Let's see if compareAndSwap will be an intrinsic call:
-XX:+PrintIntrinsics -XX:-PrintAssembly
@ 7 jdk.internal.misc.Unsafe::compareAndSwapInt (0 bytes) (intrinsic)
@ 20 jdk.internal.misc.Unsafe::weakCompareAndSwapInt (11 bytes) (intrinsic).
And then run the example twice with/without:
-XX:DisableIntrinsic=_compareAndSwapInt
This is sort of weird, the output is exactly the same (same exact instructions) with the only differences that with enable intrinsic I get calls like this:
0x000000010c23e355: callq 0x00000001016569d2 ; {runtime_call SharedRuntime::dtrace_method_entry(JavaThread*, Method*)}
0x000000010c23e381: callq 0x00000001016fee40 ; {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
And disabled:
0x00000001109322d5: callq 0x0000000105c569d2 ; {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}
0x00000001109322e3: callq 0x0000000105c569d2 ; {runtime_call _ZN13SharedRuntime19dtrace_method_entryEP10JavaThreadP6Method}
This is rather intriguing, shouldn't the intrinsic code be different?
EDIT-2 the8472 makes sense too.
lock addl is a substitute for mfence that flushes the StoreBuffer on x86 as far as I know and it has to do with visibility and not atomicity indeed. Right before this entry, is:
0x00000001133db6f6: movl $0x4,0x288(%r15)
0x00000001133db701: callq 0x00000001060fee40 ; {runtime_call Unsafe_CompareAndSwapInt(JNIEnv_*, _jobject*, _jobject*, long, int, int)}
0x00000001133db706: vzeroupper
0x00000001133db709: and $0xff,%eax
0x00000001133db70f: setne %al
0x00000001133db712: movl $0x5,0x288(%r15)
0x00000001133db71d: lock addl $0x0,-0x40(%rsp)
0x00000001133db723: cmpl $0x0,-0xd0bc6dd(%rip) # 0x000000010631f050
; {external_word}
If you look here is will delegate to another native call to Atomic:: cmpxchg that seems to be doing the swap atomically.
Why that is not a substitute to a direct lock cmpxchg is a mystery to me.
TL;DR You're looking at the wrong place in the assembly output.
Both compareAndSwapInt
and weakCompareAndSwapInt
calls are compiled to exactly the same ASM sequence on x86-64. However, the methods themselves are compiled differently (but it does not usually matter).
The definition of compareAndSwapInt
and weakCompareAndSwapInt
in the source code is different. The former is a native method, while the latter is a Java method.
@HotSpotIntrinsicCandidate
public final native boolean compareAndSwapInt(Object o, long offset,
int expected,
int x);
@HotSpotIntrinsicCandidate
public final boolean weakCompareAndSwapInt(Object o, long offset,
int expected,
int x) {
return compareAndSwapInt(o, offset, expected, x);
}
What you've seen is how these standalone methods are compiled. A native method compiles to a stub that calls a corresponding C function. But this is not what runs in the fast path.
Intrinsic methods are those which calls are replaced with HotSpot-specific inline implementation. Note: The calls are replaced, but not the methods themselves.
If you look at the assembly output of your WeakVsNonWeak.strong
method, you'll see that it contains lock cmpxchg
instruction, whether it calls UNSAFE.compareAndSwapInt
or UNSAFE.weakCompareAndSwapInt
.
0x000001bd76170c44: lock cmpxchg %ecx,(%r11)
0x000001bd76170c49: sete %r10b
0x000001bd76170c4d: movzbl %r10b,%r10d ;*invokevirtual compareAndSwapInt
; - WeakVsNonWeak::strong@25 (line 23)
; - WeakVsNonWeak::main@46 (line 14)
0x0000024f56af1097: lock cmpxchg %r11d,(%r8)
0x0000024f56af109c: sete %r10b
0x0000024f56af10a0: movzbl %r10b,%r10d ;*invokevirtual weakCompareAndSwapInt
; - WeakVsNonWeak::strong@25 (line 23)
; - WeakVsNonWeak::main@46 (line 14)
Once the main method is JIT-compiled, the standalone version of Unsafe.* methods will not be called directly.
In the first case, a native method is being used. Either the code hasn't been optimised or it's not an intrinsic.
In the second case an intrinsic has been used to inline the assembly required, rather than call a JNI method. I would have though both cases would do this but I guess not.
I believe the lock addl
is not the atomic op itself but a store-load barrier implementation. the atomic happens in the callq
.
Since you're already logging with PrintIntrinsics
you should check if it actually gets intrinsified.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With