Java has two ways of checking whether two booleans differ. You can compare them with !=
, or with ^
(xor). Of course, these two operators produce the same result in all cases. Still, it makes sense for both of them to be included, as discussed, for example, in What's the difference between XOR and NOT-EQUAL-TO?. It even makes sense for developers to prefer one over the other depending on context - sometimes "is exactly one of these booleans true" reads better, and other times "are these two booleans different" communicates intent better. So, perhaps which one to use should be a matter of taste and style.
What surprised me is that javac does not treat these identically! Consider this class:
class Test {
public boolean xor(boolean p, boolean q) {
return p ^ q;
}
public boolean inequal(boolean p, boolean q) {
return p != q;
}
}
Obviously, the two methods have the same visible behavior. But they have different bytecode:
$ javap -c Test
Compiled from "Test.java"
class Test {
Test();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public boolean xor(boolean, boolean);
Code:
0: iload_1
1: iload_2
2: ixor
3: ireturn
public boolean inequal(boolean, boolean);
Code:
0: iload_1
1: iload_2
2: if_icmpeq 9
5: iconst_1
6: goto 10
9: iconst_0
10: ireturn
}
If I had to guess, I'd say that xor
performs better, since it just returns the result of its comparison; adding in a jump and an extra load just seems like wasted work. But instead of guessing, I benchmarked a few billion calls to both methods using Clojure's "criterium" benchmarking tool. It's close enough that while it looks like xor is a bit faster I'm not good enough at statistics to say whether the results are significant:
user=> (let [t (Test.)] (bench (.xor t true false)))
Evaluation count : 4681301040 in 60 samples of 78021684 calls.
Execution time mean : 4.273428 ns
Execution time std-deviation : 0.168423 ns
Execution time lower quantile : 4.044192 ns ( 2.5%)
Execution time upper quantile : 4.649796 ns (97.5%)
Overhead used : 8.723577 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 25.4745 % Variance is moderately inflated by outliers
user=> (let [t (Test.)] (bench (.inequal t true false)))
Evaluation count : 4570766220 in 60 samples of 76179437 calls.
Execution time mean : 4.492847 ns
Execution time std-deviation : 0.162946 ns
Execution time lower quantile : 4.282077 ns ( 2.5%)
Execution time upper quantile : 4.813433 ns (97.5%)
Overhead used : 8.723577 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 22.2554 % Variance is moderately inflated by outliers
Is there some reason to prefer writing one over the other, performance-wise1? Some context in which the difference in their implementation makes one more suitable than the other? Or, does anyone know why javac implements these two identical operations so differently?
1 Of course, I will not recklessly use this information to micro-optimize. I'm just curious how this all works.
Well, I am going to provide how the CPU translates that shortly and update the post, but in the meanwhile, you are looking at waaaay too small difference to care.
byte-code in java is not an indication of how fast (or not) a method will execute, there are two JIT compilers that will make this method look entirely different once they are hot enough. also javac
is known to do very little optimizations once it compiles the code, the real optimizations come from JIT
.
I've put up some tests using JMH
for this using either C1
compiler only or replacing C2
with GraalVM
or no JIT
at all... (lots of testing code follows, you can skip it and just look at the results, this is done using jdk-12
btw). This code is using JMH - the de facto tool to use in java world of micro-benchmarks (which are notoriously error-prone if done by hand).
@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class BooleanCompare {
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(BooleanCompare.class.getName())
.build();
new Runner(opt).run();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean xor(BooleanExecutionPlan plan) {
return plan.booleans()[0] ^ plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean plain(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean xorNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean plainNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean xorC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean plainC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean xorC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean plainC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean xorGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean plainGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
}
And the results:
BooleanCompare.plain avgt 2 3.125 ns/op
BooleanCompare.xor avgt 2 2.976 ns/op
BooleanCompare.plainC1Only avgt 2 3.400 ns/op
BooleanCompare.xorC1Only avgt 2 3.379 ns/op
BooleanCompare.plainC2Only avgt 2 2.583 ns/op
BooleanCompare.xorC2Only avgt 2 2.685 ns/op
BooleanCompare.plainGraalVM avgt 2 2.980 ns/op
BooleanCompare.xorGraalVM avgt 2 3.868 ns/op
BooleanCompare.plainNoJIT avgt 2 243.348 ns/op
BooleanCompare.xorNoJIT avgt 2 201.342 ns/op
I am not a versatile enough person to read assembler, though I sometimes like to do that... Here are some interesting things. If we do:
C1 compiler only with !=
/*
* run many iterations of this with :
* java -XX:+UnlockDiagnosticVMOptions
* -XX:TieredStopAtLevel=1
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
we get:
0x000000010d1b2bc7: push %rbp
0x000000010d1b2bc8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000010d1b2bcc: cmp %edx,%esi
0x000000010d1b2bce: mov $0x0,%eax
0x000000010d1b2bd3: je 0x000000010d1b2bde
0x000000010d1b2bd9: mov $0x1,%eax
0x000000010d1b2bde: and $0x1,%eax
0x000000010d1b2be1: add $0x30,%rsp
0x000000010d1b2be5: pop %rbp
To me, this code is a bit obvious: put 0 into eax
, compare (edx, esi)
-> if not equal put 1 into eax
. return eax & 1
.
C1 compiler with ^:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x40] (sp of caller)
0x000000011326e5c0: mov %eax,-0x14000(%rsp)
0x000000011326e5c7: push %rbp
0x000000011326e5c8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000011326e5cc: xor %rdx,%rsi
0x000000011326e5cf: and $0x1,%esi
0x000000011326e5d2: mov %rsi,%rax
0x000000011326e5d5: add $0x30,%rsp
0x000000011326e5d9: pop %rbp
I don't really know why and $0x1,%esi
is needed here, otherwise this is fairly simple too, I guess.
But if I enable C2 compiler, things are a lot more interesting.
/**
* run with java
* -XX:+UnlockDiagnosticVMOptions
* -XX:CICompilerCount=2
* -XX:-TieredCompilation
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x20] (sp of caller)
0x000000011a2bbfa0: sub $0x18,%rsp
0x000000011a2bbfa7: mov %rbp,0x10(%rsp)
0x000000011a2bbfac: xor %r10d,%r10d
0x000000011a2bbfaf: mov $0x1,%eax
0x000000011a2bbfb4: cmp %edx,%esi
0x000000011a2bbfb6: cmove %r10d,%eax
0x000000011a2bbfba: add $0x10,%rsp
0x000000011a2bbfbe: pop %rbp
I don't even see the classic epilog push ebp; mov ebp, esp; sub esp, x
, instead something very un-usual (at least for me) via:
sub $0x18,%rsp
mov %rbp,0x10(%rsp)
....
add $0x10,%rsp
pop %rbp
Again, someone more versatile than me, can explain hopefully. Otherwise it's like a better version of the C1
generated:
xor %r10d,%r10d // put zero into r10d
mov $0x1,%eax // put 1 into eax
cmp %edx,%esi // compare edx and esi
cmove %r10d,%eax // conditionally move the contents of r10d into eax
AFAIK cmp/cmove
is better than cmp/je
because of branch-prediction - this is at least what I've read...
XOR with C2 compiler:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
0x000000010e6c9a20: sub $0x18,%rsp
0x000000010e6c9a27: mov %rbp,0x10(%rsp)
0x000000010e6c9a2c: xor %edx,%esi
0x000000010e6c9a2e: mov %esi,%eax
0x000000010e6c9a30: and $0x1,%eax
0x000000010e6c9a33: add $0x10,%rsp
0x000000010e6c9a37: pop %rbp
It sure looks like it's almost the same as C1
compiler generated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With