Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there ever a reason not to use Java 8's parallelSort?

I was reading this question about the differences between Java's Arrays.sort and Arrays.parallelSort, which is a few years old by now. What surprised me is that there was only one question that mentioned any downside to using parallelSort; namely that the speedup decreases if you're using a lot of your CPU.

Assuming you're not in some sort of specialized, single-threaded environment, should one always choose parallelSort? Is there ever a reason not to? Note that one of the answers to the question above mentions that if there are fewer than 4096 elements, parallelSort simply calls sort anyway.

like image 682
Calvin Godfrey Avatar asked Jul 01 '19 19:07

Calvin Godfrey


People also ask

What is the difference between sort and parallelSort in Java?

The parallelSort() is functionally different. Unlike sort(), which sorts data sequentially using a single thread, it uses a parallel sort-merge sorting algorithm. It breaks the array into sub-arrays that are themselves sorted and then merged. For executing parallel tasks it uses the ForkJoin pool.

What is the use of parallel sort in Java?

Parallel Sort uses Fork/Join framework introduced in Java 7 to assign the sorting tasks to multiple threads available in the thread pool. Fork/Join implements a work stealing algorithm where in a idle thread can steal tasks queued up in another thread.

What is arrays parallelSort?

Parallel array sort in Java 8 util. Arrays class for Parallel Sorting. In parallel array sorting the sorting algorithm is a parallel sort-merge that breaks the array into sub-arrays that are themselves sorted and then merged. The Fork/Join common thread pool is used to execute any parallel tasks.


1 Answers

There are some downsides to using Arrays.parallelSort

  • it uses the ForkJoinPool.commonPool() and will fight with other functions that use it by default (e.g. parallel() on a stream)
  • the thread-pool Arrays.parallelSort uses is not configurable (only on a global level by increasing the common-pools thread amount)
  • it performs worse on small data sets (more often than not arrays contain few elements, the JDK even acknowledges that e.g. most ArrayList stay empty for their whole lifetime which saves quite a bit of memory and CPU time for not instantiating arrays that will never be filled)

And another anecdotal scenario: say if you implement some card game that needs sorting. Its embarassingly easy to parallelize multiple game executions next to each other instead of parallelizing the sorting mechanism of one run which may only take a fraction of the whole game loop. You lost an easy way to parallelize now (e.g. when running the game in the context of genetic algorithms).

But yes, if you happen to have large arrays and sorting is a substantial part of your applications runtime use Arrays.parallelSort.

EDIT: And even if Arrays.parallelSort switches to a normal sort if the given array has less than 4096 elements: it's all about showing intention - you want a parallel sort if possible which holds a different meaning than just calling sort. And to be nitpicky: it does indeed perform worse on small arrays as it has to do the additional check if the array is containing less than 4096 elements and some other checks about the common pools thread count (whichs overhead is of course negligible) :).

like image 110
roookeee Avatar answered Sep 21 '22 17:09

roookeee