I've been looking at the implementation of <code>ThreadLocal</code> in the JDK, out of curiosity, and I found this : <pre class="prettyprint"><code>/** * Increment i modulo len. */ private static int nextIndex(int i, int len) { return ((i + 1 < len) ? i + 1 : 0); } </code></pre> It looks fairly obvious that this could be implemented with a simple <code>return (i + 1) % len</code>, but I think these guys know their stuff. Any idea why they did this ? This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?

<code>%</code> is avoided for performance reasons in this example. <code>div</code>/<code>rem</code> operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of <code>idiv</code> instruction on Haswell is about 10 cycles, but only 1 cycle for <code>add</code>. Let's benchmark using JMH. <pre class="prettyprint"><code>import org.openjdk.jmh.annotations.*; @State(Scope.Benchmark) public class Modulo { @Param("16") int len; int i; @Benchmark public int baseline() { return i; } @Benchmark public int conditional() { return i = (i + 1 < len) ? i + 1 : 0; } @Benchmark public int mask() { return i = (i + 1) & (len - 1); } @Benchmark public int mod() { return i = (i + 1) % len; } } </code></pre> Results: <pre class="prettyprint"><code>Benchmark (len) Mode Cnt Score Error Units Modulo.baseline 16 avgt 10 2,951 ± 0,038 ns/op Modulo.conditional 16 avgt 10 3,517 ± 0,051 ns/op Modulo.mask 16 avgt 10 3,765 ± 0,016 ns/op Modulo.mod 16 avgt 10 9,125 ± 0,023 ns/op </code></pre> As you can see, using <code>%</code> is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed <code>ThreadLocal</code> code, because the divisor (<code>table.length</code>) is variable.

<code>mod</code> is not that slow in Java. It's implemented as the byte code instructions <code>irem</code> and <code>frem</code> for Integers and Floats respectively. The JIT does a good job of optimizing this. In my benchmarks (see article), <code>irem</code> calls in JDK 1.8 take about 1 nanosecond. That's pretty quick. <code>frem</code> calls are about 3x slower, so use integers where possible. If you're using Natural Integers (e.g. array indexing) and a power of 2 Divisor (e.g. 8 thread locals), then you can use a bit twiddling trick to get a 20% performance gain.

Is modulo slow in Java?

Tags:

I've been looking at the implementation of ThreadLocal in the JDK, out of curiosity, and I found this :

Click to copy

/**  * Increment i modulo len.  */  private static int nextIndex(int i, int len) {      return ((i + 1 < len) ? i + 1 : 0);  }

It looks fairly obvious that this could be implemented with a simple return (i + 1) % len, but I think these guys know their stuff. Any idea why they did this ?

This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?

610

asked Mar 04 '16 00:03

Dici

2 Answers

% is avoided for performance reasons in this example.

div/rem operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of idiv instruction on Haswell is about 10 cycles, but only 1 cycle for add.

Let's benchmark using JMH.

Click to copy

import org.openjdk.jmh.annotations.*;  @State(Scope.Benchmark) public class Modulo {     @Param("16")     int len;      int i;      @Benchmark     public int baseline() {         return i;     }      @Benchmark     public int conditional() {         return i = (i + 1 < len) ? i + 1 : 0;     }      @Benchmark     public int mask() {         return i = (i + 1) & (len - 1);     }      @Benchmark     public int mod() {         return i = (i + 1) % len;     } }

Results:

Click to copy

Benchmark           (len)  Mode  Cnt  Score   Error  Units Modulo.baseline        16  avgt   10  2,951 ± 0,038  ns/op Modulo.conditional     16  avgt   10  3,517 ± 0,051  ns/op Modulo.mask            16  avgt   10  3,765 ± 0,016  ns/op Modulo.mod             16  avgt   10  9,125 ± 0,023  ns/op

As you can see, using % is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed ThreadLocal code, because the divisor (table.length) is variable.

162

answered Sep 28 '22 06:09

apangin

mod is not that slow in Java. It's implemented as the byte code instructions irem and frem for Integers and Floats respectively. The JIT does a good job of optimizing this.

In my benchmarks (see article), irem calls in JDK 1.8 take about 1 nanosecond. That's pretty quick. frem calls are about 3x slower, so use integers where possible.

If you're using Natural Integers (e.g. array indexing) and a power of 2 Divisor (e.g. 8 thread locals), then you can use a bit twiddling trick to get a 20% performance gain.

answered Sep 28 '22 04:09

Joseph Lust

Related questions
                            
                                The transaction operation cannot be performed because there are pending requests working
                            
                                How to define format when using pandas to_datetime?
                            
                                How is C# string interpolation compiled?
                            
                                Xamarin.Forms Page BackgroundImage property
                            
                                Why are my decimal values being rounded to integers in SQL insertions?
                            
                                Chat App for Android using a XMPP Server and Google Cloud Messaging (or the newer Firebase Cloud Messaging) for Push Notifications
                            
                                range based for loop with existing variable
                            
                                React Native, Android Log.
                            
                                How to subscribe to a list of multiple kafka wildcard patterns using kafka-python?
                            
                                How to get access of the state tree in effects? (@ngrx/effects 2.x)
                            
                                How to create postgres extension inside the container?
                            
                                Intercept request with WKWebView

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is modulo slow in Java?

Tags:

Dici

People also ask

2 Answers

apangin

Joseph Lust

Recent Activity

Donate For Us