I've been looking at the implementation of ThreadLocal
in the JDK, out of curiosity, and I found this :
/** * Increment i modulo len. */ private static int nextIndex(int i, int len) { return ((i + 1 < len) ? i + 1 : 0); }
It looks fairly obvious that this could be implemented with a simple return (i + 1) % len
, but I think these guys know their stuff. Any idea why they did this ?
This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?
Integer division and modulo are relatively slow because there is no direct hardware support (they compile to multiple instruction sequences). Floating point modulo is fast.
So in simple terms, this should give you a feel for why division and hence modulo is slower: computers still have to do long division in the same stepwise fashion tha you did in grade school.
A modulo-operation is very slow. An if is most likely to be faster than a modulo and more readable.
Modulo operator is expensive but the division is expensive too. So converting your code from using modulo operator to division is not going to optimize your code. Show activity on this post. Modulo can be done with a single processor instruction on most architectures (ex.
%
is avoided for performance reasons in this example.
div
/rem
operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of idiv
instruction on Haswell is about 10 cycles, but only 1 cycle for add
.
Let's benchmark using JMH.
import org.openjdk.jmh.annotations.*; @State(Scope.Benchmark) public class Modulo { @Param("16") int len; int i; @Benchmark public int baseline() { return i; } @Benchmark public int conditional() { return i = (i + 1 < len) ? i + 1 : 0; } @Benchmark public int mask() { return i = (i + 1) & (len - 1); } @Benchmark public int mod() { return i = (i + 1) % len; } }
Results:
Benchmark (len) Mode Cnt Score Error Units Modulo.baseline 16 avgt 10 2,951 ± 0,038 ns/op Modulo.conditional 16 avgt 10 3,517 ± 0,051 ns/op Modulo.mask 16 avgt 10 3,765 ± 0,016 ns/op Modulo.mod 16 avgt 10 9,125 ± 0,023 ns/op
As you can see, using %
is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed ThreadLocal
code, because the divisor (table.length
) is variable.
mod
is not that slow in Java. It's implemented as the byte code instructions irem
and frem
for Integers and Floats respectively. The JIT does a good job of optimizing this.
In my benchmarks (see article), irem
calls in JDK 1.8 take about 1 nanosecond. That's pretty quick. frem
calls are about 3x slower, so use integers where possible.
If you're using Natural Integers (e.g. array indexing) and a power of 2 Divisor (e.g. 8 thread locals), then you can use a bit twiddling trick to get a 20% performance gain.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With