I'm creating an infinite stream of Integers starting at 200 Million, filter this stream using a naive primality test implementation to generate load and limit the result to 10.
Predicate<Integer> isPrime = new Predicate<Integer>() {
@Override
public boolean test(Integer n) {
for (int i = 2; i < n; i++) {
if (n % i == 0) return false;
}
return true;
}
};
Stream.iterate(200_000_000, n -> ++n)
.filter(isPrime)
.limit(10)
.forEach(i -> System.out.print(i + " "));
This works as expected.
Now, if I add a call to parallel() before filtering, nothing is produced and the processing does not complete.
Stream.iterate(200_000_000, n -> ++n)
.parallel()
.filter(isPrime)
.limit(10)
.forEach(i -> System.out.print(i + " "));
Can someone point me in the right direction of what's happening here?
EDIT: I am not looking for better primality test implementations (it is intended to be a long running implementation) but for an explanation of the negative impact of using a parallel stream.
Processing actually completes, though may take quite a long time depending on number of hardware threads on your machine. API documentation about limit warns that it might be slow for parallel streams.
Actually the parallel stream first splits the computation to the several parts according to the available parallelism level, performs a computation for every part, then join the results together. How many parts do you have in your task? One per common FJP thread (=Runtime.getRuntime().availableProcessors()
) plus (sometimes?) one for current thread if it's not in FJP. You can control it adding
System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4");
Practically for your task the lower number you set, the faster it will compute.
How to split the unlimited task? You particular task is handled by IteratorSpliterator which trySplit method creates chunks of ever-increasing size starting from 1024. You may try by yourself:
Spliterator<Integer> spliterator = Stream.iterate(200_000_000, n -> ++n).spliterator();
Spliterator[] spliterators = new Spliterator[10];
for(int i=0; i<spliterators.length; i++) {
spliterators[i] = spliterator.trySplit();
}
for(int i=0; i<spliterators.length; i++) {
System.out.print((i+1)+": ");
spliterators[i].tryAdvance(System.out::println);
}
So the first chunk handles numbers of range 200000000-200001023, the second handles numbers of range 200001024-200003071 and so on. If you have only 1 hardware thread, your task will be split to two chunks, so 3072 will be checked. If you have 8 hardware threads, your task will be split to 9 chunks and 46080 numbers will be checked. Only after all the chunks are processed the parallel computation will stop. The heuristic of splitting the task to such a big chunks doesn't work good in your case, but you would see the performance boost had the prime numbers around that region appear once in several thousand numbers.
Probably your particular scenario could be optimized internally (i.e. stop the computation if the first thread found that limit condition is already achieved). Feel free to report a bug to Java bug tracker.
Update after digging more inside the Stream API I concluded that current behavior is a bug, raised an issue and posted a patch. It's likely that the patch will be accepted for JDK9 and probably even backported to JDK 8u branch. With my patch the parallel version still does not improve the performance, but at least its working time is comparable to sequential stream working time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With