Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java 8 parallelStream().forEach Result data loss

Tags:

java

There are two test cases which use parallelStream():

List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
  src.add(i);
}
List<String> strings = new ArrayList<>();
       
src.parallelStream().filter(integer -> (integer % 2) == 0).forEach(integer -> strings.add(integer + ""));
    
System.out.println("=size=>" + strings.size());
=size=>9332
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
  src.add(i);
}
List<String> strings = new ArrayList<>();

src.parallelStream().forEach(integer -> strings.add(integer + ""));

System.out.println("=size=>" + strings.size());
=size=>17908

Why do I always lose data when using parallelStream? What did i do wrong?

like image 757
Hugh Avatar asked Sep 16 '20 07:09

Hugh


Video Answer


1 Answers

ArrayList isn't thread safe. You need to do

List<String> strings = Collections.synchronizedList(new ArrayList<>());

or

List<String> strings = new Vector<>();

to ensure all updates are synchronized, or switch to

List<String> strings = src.parallelStream()
    .filter(integer -> (integer % 2) == 0)
    .map(integer -> integer + "")
    .collect(Collectors.toList());

and leave the list building to the Streams framework. Note that it's undefined whether the list returned by collect is modifiable, so if that is a requirement, you may need to modify your approach.

In terms of performance, Stream.collect is likely to be much faster than using Stream.forEach to add to a synchronized collection, since the Streams framework can handle collection of values in each thread separately without synchronization and combine the results at the end in a thread safe fashion.

like image 51
markusk Avatar answered Oct 17 '22 10:10

markusk