Java 8 provides java.util.Arrays.parallelSort
, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort
for sorting lists.
I can use toArray
, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)
Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)
The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.
How can I sort a List in parallel without creating a temporary array?
Java 8 introduced a new method called as parallelSort() in java.util.Arrays Class. It uses Parallel Sorting of array elements. Algorithm of parallelSort() 1. The array is divided into sub-arrays and that sub-arrays is again divided into their sub-arrays, until the minimum level of detail in a set of array.
There doesn't appear to be any straightforward way to sort a List
in parallel in Java 8. I don't think this is fundamentally difficult; it looks more like an oversight to me.
The difficulty with a hypothetical Collections.parallelSort(list, cmp)
is that the Collections
implementation knows nothing about the list's implementation or its internal organization. This can be seen by examining the Java 7 implementation of Collections.sort(list, cmp)
. As you observed, it has to copy the list elements out to an array, sort them, and then copy them back into the list.
This is the big advantage of the List.sort(cmp)
extension method over Collections.sort(list, cmp)
. It might seem that this is merely a small syntactic advantage being able to write myList.sort(cmp)
instead of Collections.sort(myList, cmp)
. The difference is that myList.sort(cmp)
, being an interface extension method, can be overridden by the specific List
implementation. For example, ArrayList.sort(cmp)
sorts the list in-place using Arrays.sort()
whereas the default implementation implements the old copyout-sort-copyback technique.
It should be possible to add a parallelSort
extension method to the List
interface that has similar semantics to List.sort
but does the sorting in parallel. This would allow ArrayList
to do a straightforward in-place sort using Arrays.parallelSort
. (It's not entirely clear to me what the default implementation should do. It might still be worth it to do copyout-parallelSort-copyback.) Since this would be an API change, it can't happen until the next major release of Java SE.
As for a Java 8 solution, there are a couple workarounds, none very pretty (as is typical of workarounds). You could create your own array-based List
implementation and override sort()
to sort in parallel. Or you could subclass ArrayList
, override sort()
, grab the elementData
array via reflection and call parallelSort()
on it. Of course you could just write your own List
implementation and provide a parallelSort()
method, but the advantage of overriding List.sort()
is that this works on the plain List
interface and you don't have to modify all the code in your code base to use a different List
subclass.
I think you are doomed to use a custom List
implementation augmented with your own parallelSort
or else change all your other code to store the big data in Array
types.
This is the inherent problem with layers of abstract data types. They're meant to isolate the programmer from details of implementation. But when the details of implementation matter - as in the case of underlying storage model for sort - the otherwise splendid isolation leaves the programmer helpless.
The standard List
sort documents provide an example. After the explanation that mergesort is used, they say
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)
In other words, "since we don't know the underlying storage model for a List
and couldn't touch it if we did, we make a copy organized in a known way." The parenthesized expression is based on the fact that the List
"i'th element accessor" on a linked list is Omega(n), so the normal array mergesort implemented with it would be a disaster. In fact it's easy to implement mergesort efficiently on linked lists. The List
implementer is just prevented from doing it.
A parallel sort on List
has the same problem. The standard sequential sort fixes it with custom sort
s in the concrete List
implementations. The Java folks just haven't chosen to go there yet. Maybe in Java 9.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With