Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of Java Parallel Stream vs ExecutorService

Suppose we have a list and want to pick all the elements satisfying a property(let say some functions f). There are 3 ways to parallel this process.

One :

listA.parallelStream.filter(element -> f(element)).collect(Collectors.toList());

Two:

listA.parallelStream.collect(Collectors.partitioningBy(element -> f(element))).get(true);

Three:

ExecutorService executorService = Executors.newFixedThreadPool(nThreads);
//separate the listA into several batches
for each batch {
     Future<List<T>> result = executorService.submit(() -> {
          // test the elements in this batch and return the validate element list
     });
}
//merge the results from different threads.

Suppose the testing function is a CPU intensive task. I want to know which method is more efficient. Thanks a lot.

like image 610
Polaris Avatar asked Sep 12 '18 03:09

Polaris


2 Answers

One and two use ForkJoinPool which is designed exactly for parallel processing of one task while ThreadPoolExecutor is used for concurrent processing of independent tasks. So One and Two are supposed to be faster.

like image 54
Evgeniy Dorofeev Avatar answered Sep 22 '22 12:09

Evgeniy Dorofeev


When you use .filter(element -> f(element)).collect(Collectors.toList()), it will collect the matching elements into a List, whereas .collect(Collectors.partitioningBy(element -> f(element))) will collect all elements into either of two lists, followed by you dropping one of them and only retrieving the list of matches via .get(true).

It should be obvious that the second variant can only be on par with the first in the best case, i.e. if all elements match the predicate anyway or when the JVM’s optimizer is capable of removing the redundant work. In the worst lase, e.g. when no element matches, the second variant collects a list of all elements only to drop it afterwards, where the first variant would not collect any element.

The third variant is not comparable, as you didn’t show an actual implementation but just a sketch. There is no point in comparing a hypothetical implementation with an actual. The logic you describe, is the same as the logic of the parallel stream implementation. So you’re just reinventing the wheel. It may happen that you do something slightly better than the reference implementation or just better tailored to your specific task, but chances are much higher that you overlook things which the Stream API implementors did already consider during the development process which lasted several years.

So I wouldn’t place any bets on your third variant. If we add the time you will need to complete the third variant’s implementation, it can never be more efficient than just using either of the other variants.

So the first variant is the most efficient variant, especially as it is also the simplest, most readable, directly expressing your intent.

like image 41
Holger Avatar answered Sep 21 '22 12:09

Holger