JDK 8 EA is out now, and I am just trying to get used to the lambda and the new Stream API. I've tried to sort a list with parallel stream, but the result is always wrong:
import java.util.ArrayList; import java.util.List; public class Test { public static void main(String[] args) { List<String> list = new ArrayList<>(); list.add("C"); list.add("H"); list.add("A"); list.add("A"); list.add("B"); list.add("F"); list.add(""); list.parallelStream() // in parallel, not just concurrently! .filter(s -> !s.isEmpty()) // remove empty strings .distinct() // remove duplicates .sorted() // sort them .forEach(s -> System.out.println(s)); // print each item } }
OUTPUT:
C F B H A
Note that each time the output is different. My questions is, is it a bug? or is it not possible to sort a list in parallel? if so, then why the JavaDoc doesn't state that? Last question, is there another operation whose output would differ depending on the stream type?
The forEach operation of the parallel stream is adding elements to an un-synchronized Collection (an ArrayList ) from multiple threads. Therefore, the operation is not thread safe, and has unexpected results.
If our Stream is ordered, it doesn't matter whether our data is being processed sequentially or in parallel; the implementation will maintain the encounter order of the Stream.
stream() works in sequence on a single thread with the println() operation. list. parallelStream(), on the other hand, is processed in parallel, taking full advantage of the underlying multicore environment. The interesting aspect is in the output of the preceding program.
Normally any java code has one stream of processing, where it is executed sequentially. Whereas by using parallel streams, we can divide the code into multiple streams that are executed in parallel on separate cores and the final result is the combination of the individual outcomes.
You need to use forEachOrdered
, not forEach
.
As per the forEach
doc:
For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With