In the "95% of performance is about clean representative models" talk by Martin Thompson, between 17 and 21 minute, such code is presented:
public class Queue
{
private final Object[] buffer;
private final int capacity;
// Rest of the code
}
In 20:16 he says:
You can get much better performance, so leaving things like
capacity
in there is the right thing to do.
I tried to come up with a code example in which capacity
will be much faster than buffer.length
, but I failed.
Martin is saying that problems arise in two scenerios:
length
field is also final
, JLS 10.7. So, I don't see how this could be a problem.capacity
vs buffer.length
one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking.Could you please provide a code example, which demonstrates a case in which capacity
is superior to buffer.length
in terms of performance?
The more common case (frequently spotted in the real code), the better.
Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.
array.length: length is a final variable applicable for arrays. With the help of the length variable, we can obtain the size of the array. Examples: int size = arr[].length; // length can be used // for int[], double[], String[] // to know the length of the arrays.
array. length: length is a final variable applicable for arrays. With the help of the length variable, we can obtain the size of the array. string.
BinarySearch will be faster than List<T>.
ArrayList doesn't have length() method, the size() method of ArrayList provides the number of objects available in the collection. Array has length property which provides the length or capacity of the Array. It is the total space allocated during the initialization of the array.
When you access array normally, JVM uses its length
anyway to perform bounds check. But when you access array via sun.misc.Unsafe
(like Martin does), you don't have to pay this implicit penalty.
Array's length
field usually lies in the same cache line as its first elements, so you will have false sharing when multiple threads write into first indices concurrently. Using the separate field for buffer capacity will break this false sharing.
Here is a benchmark that shows how capacity
field makes an array access substantially faster:
package bench;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import sun.misc.Unsafe;
import java.lang.reflect.Field;
import java.util.concurrent.atomic.AtomicReferenceArray;
@State(Scope.Benchmark)
@Threads(4)
public class Queue {
private static final Unsafe unsafe = getUnsafe();
private static final long base = unsafe.arrayBaseOffset(Object[].class);
private static final int scale = unsafe.arrayIndexScale(Object[].class);
private AtomicReferenceArray<Object> atomic;
private Object[] buffer;
private int capacity;
@Param({"0", "25"})
private volatile int index;
@Setup
public void setup() {
capacity = 32;
buffer = new Object[capacity];
atomic = new AtomicReferenceArray<>(capacity);
}
@Benchmark
public void atomicArray() {
atomic.set(index, "payload");
}
@Benchmark
public void unsafeArrayLength() {
int index = this.index;
if (index < 0 || index >= buffer.length) {
throw new ArrayIndexOutOfBoundsException();
}
unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
}
@Benchmark
public void unsafeCapacityField() {
int index = this.index;
if (index < 0 || index >= capacity) {
throw new ArrayIndexOutOfBoundsException();
}
unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
}
private static Unsafe getUnsafe() {
try {
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
return (Unsafe) f.get(null);
} catch (IllegalAccessException | NoSuchFieldException e) {
throw new AssertionError("Should not happen");
}
}
}
Results:
Benchmark (index) Mode Cnt Score Error Units
Queue.atomicArray 0 thrpt 5 41804,825 ± 928,882 ops/ms
Queue.atomicArray 25 thrpt 5 84713,201 ± 1067,911 ops/ms
Queue.unsafeArrayLength 0 thrpt 5 48656,296 ± 676,166 ops/ms
Queue.unsafeArrayLength 25 thrpt 5 88812,863 ± 1089,380 ops/ms
Queue.unsafeCapacityField 0 thrpt 5 88904,433 ± 360,936 ops/ms
Queue.unsafeCapacityField 25 thrpt 5 88633,490 ± 1426,329 ops/ms
You shouldn't percieve Martin's words directly. When he said "Using array.length
is an anti-pattern that is copied over projects", I think it's slyness.
Using the capacity
field indeed allows to improve locality, pollutes caches less and helps to avoid false sharing, but it requires to write really horrible source code, that is very far from being "clean and simple", Martin advertised in this talk.
The problem is, even you don't write array.length
in your source directly, the JVM anyway access the length (that means, accesses the array header) on each array indexing array[i]
, to check bounds. Hotspot JVM has issues with eliminating bounds checks even in "simple" looping cases, and I think it is not able to interpret some "external" checks like if (i < capacity) return array[i];
as the bound check, i. e. bind the capacity field and the array size.
That is why, to make capacity
-pattern making any sense, you need to access the array only via Unsafe
! That, unfortunately, disables many bulk loop optimizations.
Look at Martin's "clean" queue implementation :)
I also could try to explain what was meant under concurrent considerations when accessing "final" array.length
. My experiments show that even "read-read" concurrent cache line access introduce some sort of "false sharing" and slowing the things down. (I think JVM engineers considered this when made @sun.misc.Contended
to offset 128 bytes from both sides of the contended fields; probably this is to ensure both two-side cache line prefetch and "read-read false sharing" won't affect the performance.)
That is why, when queue consumers and producers access the capacity to wrap around the buffer, they better access different objects, containing the same (by value) capacity
field, and a reference to the same array. Accessing this array via unsafe producers and compumers normally access different areas of that array, don't sharing anything falsely.
IMO the antipattern now is to try implementing another Queue
, while people behind https://github.com/JCTools/JCTools (including Martin, btw) optimize this to death.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With