I was going through the concepts of parallel sort introduced in Java 8. As per the doc.
If the length of the specified array is less than the minimum granularity, then it is sorted using the appropriate Arrays.sort method.
The spec however doesn't specify this minimum limit.
When I looked up the Code in java.util.Arrays
it was defined as
private static final int MIN_ARRAY_SORT_GRAN = 1 << 13;
i.e., 8192 values in the array
As per the explanation provided here.
I understand why the value was Hard-coded as 8192.
It was designed keeping the current CPU architecture in mind.
With the-XX:+UseCompressedOops
option being enabled by default, any system with less than 32GB RAM would be using 32bit(4bytes) pointers. Now, with a L1 Cache size of 32KB for data portion, we can pass 32KB/4Bytes = 8KB of data at once to CPU for computation. That's equivalent to 8192 bytes of data being processed at once.
So for a function which is working on sorting a byte array parallelSort(byte[])
this makes sense. You can keep minimum parallel sort limit as 8192 values (each value = 1 byte for byte array).
But If you consider public static void parallelSort(int[] a)
An Integer Variable is of 4Bytes(32-bit). So ideally of the 8192 bytes, we can store 8192/4 = 2048 numbers in CPU cache at once. So the minimum granularity in this case is suppose to be 2048.
Why are all parallelSort functions in Java (be it byte[], int[], long[], etc.) using 8192 as the default min. number of values needed in order to perform parallel sorting?
Shouldn't it vary according to the types passed to the parallelSort function?
First, it seems that you've misread the linked explanation. L1 data cache is 32Kb, so for int[]
it fits ideally: 32768/4=8192
ints could be placed into L1 cache while.
Second, I don't think the given explanation is correct. It concentrates on pointers, so it says mainly about sorting object array, but when you compare the data in the objects array, you always need to dereference these pointers accessing the real data. And in case if your objects have non-primitive fields, you'll have to dereference them even further. For example, if you sort an array of strings, you have to access not only array itself, but also String
objects and char[]
arrays which are stored inside them. All of these would require many additional cache lines.
I did not find any explicit explanation about this particular value in review thread for this change. Previously it was 256, then it was changed to 8192 as part of JDK-8014076 update. I think it just shown best performance on some reasonable test suite. Keeping separate thresholds for different cases would add more complexity. Probably tests show that it's not paying off. Note that ideal threshold is impossible for Object[]
arrays as compare function is user-specified and could have arbitrary complexity. For sufficiently complex compare function it's probably reasonable to parallelize even very small arrays.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With