In the "95% of performance is about clean representative models" talk by Martin Thompson, between 17 and 21 minute, such code is presented: <pre class="prettyprint"><code>public class Queue { private final Object[] buffer; private final int capacity; // Rest of the code } </code></pre> In 20:16 he says: <blockquote> You can get much better performance, so leaving things like <code>capacity</code> in there is the right thing to do. </blockquote> I tried to come up with a code example in which <code>capacity</code> will be much faster than <code>buffer.length</code>, but I failed. Martin is saying that problems arise in two scenerios: <ol> <li> In a concurrent world. But, <code>length</code> field is also <code>final</code>, JLS 10.7. So, I don't see how this could be a problem.</li> <li> When a cache misses. I tried calling <code>capacity</code> vs <code>buffer.length</code> one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking.</li> </ol> Could you please provide a code example, which demonstrates a case in which <code>capacity</code> is superior to <code>buffer.length</code> in terms of performance? The more common case (frequently spotted in the real code), the better. Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.

You shouldn't percieve Martin's words directly. When he said "Using <code>array.length</code> is an anti-pattern that is copied over projects", I think it's slyness. Using the <code>capacity</code> field indeed allows to improve locality, pollutes caches less and helps to avoid false sharing, but it requires to write really horrible source code, that is very far from being "clean and simple", Martin advertised in this talk. The problem is, even you don't write <code>array.length</code> in your source directly, the JVM anyway access the length (that means, accesses the array header) on each array indexing <code>array[i]</code>, to check bounds. Hotspot JVM has issues with eliminating bounds checks even in "simple" looping cases, and I think it is not able to interpret some "external" checks like <code>if (i < capacity) return array[i];</code> as the bound check, i. e. bind the capacity field and the array size. That is why, to make <code>capacity</code>-pattern making any sense, you need to access the array only via <code>Unsafe</code>! That, unfortunately, disables many bulk loop optimizations. Look at Martin's "clean" queue implementation :) I also could try to explain what was meant under concurrent considerations when accessing "final" <code>array.length</code>. My experiments show that even "read-read" concurrent cache line access introduce some sort of "false sharing" and slowing the things down. (I think JVM engineers considered this when made <code>@sun.misc.Contended</code> to offset 128 bytes from both sides of the contended fields; probably this is to ensure both two-side cache line prefetch and "read-read false sharing" won't affect the performance.) That is why, when queue consumers and producers access the capacity to wrap around the buffer, they better access different objects, containing the same (by value) <code>capacity</code> field, and a reference to the same array. Accessing this array via unsafe producers and compumers normally access different areas of that array, don't sharing anything falsely. IMO the antipattern now is to try implementing another <code>Queue</code>, while people behind https://github.com/JCTools/JCTools (including Martin, btw) optimize this to death.

Is using own int capacity faster than using .length field of an array?

Tags:

java

performance

caching

concurrency

In the "95% of performance is about clean representative models" talk by Martin Thompson, between 17 and 21 minute, such code is presented:

public class Queue
{
    private final Object[] buffer;
    private final int capacity;

    // Rest of the code

}

In 20:16 he says:

You can get much better performance, so leaving things like capacity in there is the right thing to do.

I tried to come up with a code example in which capacity will be much faster than buffer.length, but I failed.

Martin is saying that problems arise in two scenerios:

In a concurrent world. But, length field is also final, JLS 10.7. So, I don't see how this could be a problem.
When a cache misses. I tried calling capacity vs buffer.length one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking.

Could you please provide a code example, which demonstrates a case in which capacity is superior to buffer.length in terms of performance?

The more common case (frequently spotted in the real code), the better.

Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.

544

asked May 31 '15 23:05

Adam Stelmaszczyk

2 Answers

When you access array normally, JVM uses its length anyway to perform bounds check. But when you access array via sun.misc.Unsafe (like Martin does), you don't have to pay this implicit penalty.

Array's length field usually lies in the same cache line as its first elements, so you will have false sharing when multiple threads write into first indices concurrently. Using the separate field for buffer capacity will break this false sharing.

Here is a benchmark that shows how capacity field makes an array access substantially faster:

package bench;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import sun.misc.Unsafe;

import java.lang.reflect.Field;
import java.util.concurrent.atomic.AtomicReferenceArray;

@State(Scope.Benchmark)
@Threads(4)
public class Queue {
    private static final Unsafe unsafe = getUnsafe();
    private static final long base = unsafe.arrayBaseOffset(Object[].class);
    private static final int scale = unsafe.arrayIndexScale(Object[].class);

    private AtomicReferenceArray<Object> atomic;
    private Object[] buffer;
    private int capacity;

    @Param({"0", "25"})
    private volatile int index;

    @Setup
    public void setup() {
        capacity = 32;
        buffer = new Object[capacity];
        atomic = new AtomicReferenceArray<>(capacity);
    }

    @Benchmark
    public void atomicArray() {
        atomic.set(index, "payload");
    }

    @Benchmark
    public void unsafeArrayLength() {
        int index = this.index;
        if (index < 0 || index >= buffer.length) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    @Benchmark
    public void unsafeCapacityField() {
        int index = this.index;
        if (index < 0 || index >= capacity) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    private static Unsafe getUnsafe() {
        try {
            Field f = Unsafe.class.getDeclaredField("theUnsafe");
            f.setAccessible(true);
            return (Unsafe) f.get(null);
        } catch (IllegalAccessException | NoSuchFieldException e) {
            throw new AssertionError("Should not happen");
        }
    }
}

Results:

Benchmark                  (index)   Mode  Cnt      Score      Error   Units
Queue.atomicArray                0  thrpt    5  41804,825 ±  928,882  ops/ms
Queue.atomicArray               25  thrpt    5  84713,201 ± 1067,911  ops/ms
Queue.unsafeArrayLength          0  thrpt    5  48656,296 ±  676,166  ops/ms
Queue.unsafeArrayLength         25  thrpt    5  88812,863 ± 1089,380  ops/ms
Queue.unsafeCapacityField        0  thrpt    5  88904,433 ±  360,936  ops/ms
Queue.unsafeCapacityField       25  thrpt    5  88633,490 ± 1426,329  ops/ms

answered Nov 05 '22 20:11

apangin

You shouldn't percieve Martin's words directly. When he said "Using array.length is an anti-pattern that is copied over projects", I think it's slyness.

Using the capacity field indeed allows to improve locality, pollutes caches less and helps to avoid false sharing, but it requires to write really horrible source code, that is very far from being "clean and simple", Martin advertised in this talk.

The problem is, even you don't write array.length in your source directly, the JVM anyway access the length (that means, accesses the array header) on each array indexing array[i], to check bounds. Hotspot JVM has issues with eliminating bounds checks even in "simple" looping cases, and I think it is not able to interpret some "external" checks like if (i < capacity) return array[i]; as the bound check, i. e. bind the capacity field and the array size.

That is why, to make capacity-pattern making any sense, you need to access the array only via Unsafe! That, unfortunately, disables many bulk loop optimizations.

Look at Martin's "clean" queue implementation :)

I also could try to explain what was meant under concurrent considerations when accessing "final" array.length. My experiments show that even "read-read" concurrent cache line access introduce some sort of "false sharing" and slowing the things down. (I think JVM engineers considered this when made @sun.misc.Contended to offset 128 bytes from both sides of the contended fields; probably this is to ensure both two-side cache line prefetch and "read-read false sharing" won't affect the performance.)

That is why, when queue consumers and producers access the capacity to wrap around the buffer, they better access different objects, containing the same (by value) capacity field, and a reference to the same array. Accessing this array via unsafe producers and compumers normally access different areas of that array, don't sharing anything falsely.

IMO the antipattern now is to try implementing another Queue, while people behind https://github.com/JCTools/JCTools (including Martin, btw) optimize this to death.

answered Nov 05 '22 19:11

leventov

Related questions
                            
                                Where does getRequestDispatcher("path") look?
                            
                                Java loop gets slower after some runs / JIT's fault?
                            
                                What does this nested annotation do / allow?
                            
                                CoberturaInstrumenter - Unable to instrument file?
                            
                                Auth with Unirest JAVA
                            
                                How could I sniff network traffic in Java?
                            
                                What is .jws , .jpr and .jpx files in jdeveloper/adf?
                            
                                Maven project depending on two versions of the same artifact
                            
                                how to capture subscribe event in my webSocket server with Spring 4
                            
                                What is temporal object in Java?
                            
                                Spring 4 REST program to return a List in XML using JAXB
                            
                                Mockito - check if ANY method was called on an object(object was accessed)
                            
                                How do I resolve Spring model variables in a JSP when debugging with IntelliJ IDEA?
                            
                                Java 8 ambiguous method reference for generic class
                            
                                JavaFx Webview JDK 8 can not load self signed certificate
                            
                                How to extend expiration time java json web token?
                            
                                exclude some fields from @Valid validation
                            
                                How to get instance of javax.ws.rs.core.UriInfo
                            
                                How to add raw XML text to SOAPBody element
                            
                                How can a variable be null in this piece of code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With