Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the Minimum granularity defined as 8192 in Java8 in order to switch from Parallel Sort to Arrays.sort regardless of type of data

I was going through the concepts of parallel sort introduced in Java 8. As per the doc.

If the length of the specified array is less than the minimum granularity, then it is sorted using the appropriate Arrays.sort method.

The spec however doesn't specify this minimum limit.
When I looked up the Code in java.util.Arrays it was defined as

private static final int MIN_ARRAY_SORT_GRAN = 1 << 13; 

i.e., 8192 values in the array

As per the explanation provided here. I understand why the value was Hard-coded as 8192.

It was designed keeping the current CPU architecture in mind.
With the -XX:+UseCompressedOops option being enabled by default, any system with less than 32GB RAM would be using 32bit(4bytes) pointers. Now, with a L1 Cache size of 32KB for data portion, we can pass 32KB/4Bytes = 8KB of data at once to CPU for computation. That's equivalent to 8192 bytes of data being processed at once.

So for a function which is working on sorting a byte array parallelSort(byte[]) this makes sense. You can keep minimum parallel sort limit as 8192 values (each value = 1 byte for byte array).

But If you consider public static void parallelSort(int[] a)

An Integer Variable is of 4Bytes(32-bit). So ideally of the 8192 bytes, we can store 8192/4 = 2048 numbers in CPU cache at once. So the minimum granularity in this case is suppose to be 2048.

Why are all parallelSort functions in Java (be it byte[], int[], long[], etc.) using 8192 as the default min. number of values needed in order to perform parallel sorting?
Shouldn't it vary according to the types passed to the parallelSort function?

like image 422
Kishore Bandi Avatar asked Mar 21 '16 10:03

Kishore Bandi


1 Answers

First, it seems that you've misread the linked explanation. L1 data cache is 32Kb, so for int[] it fits ideally: 32768/4=8192 ints could be placed into L1 cache while.

Second, I don't think the given explanation is correct. It concentrates on pointers, so it says mainly about sorting object array, but when you compare the data in the objects array, you always need to dereference these pointers accessing the real data. And in case if your objects have non-primitive fields, you'll have to dereference them even further. For example, if you sort an array of strings, you have to access not only array itself, but also String objects and char[] arrays which are stored inside them. All of these would require many additional cache lines.

I did not find any explicit explanation about this particular value in review thread for this change. Previously it was 256, then it was changed to 8192 as part of JDK-8014076 update. I think it just shown best performance on some reasonable test suite. Keeping separate thresholds for different cases would add more complexity. Probably tests show that it's not paying off. Note that ideal threshold is impossible for Object[] arrays as compare function is user-specified and could have arbitrary complexity. For sufficiently complex compare function it's probably reasonable to parallelize even very small arrays.

like image 119
Tagir Valeev Avatar answered Sep 18 '22 21:09

Tagir Valeev