Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

collecting from parallel stream in java 8

I want to take an input and apply parallel stream on that, then I want output as list. Input could be any List or any collection on which we can apply streams.

My concerns here is that if we want output as map them we have an option from java is like

list.parallelStream().collect(Collectors.toConcurrentMap(args))

But there is no option that I can see to collect from parallel stream in thread safe way to provide list as output. I see one more option there to use

list.parallelStream().collect(Collectors.toCollection(<Concurrent Implementation>))

in this way we can provide various concurrent implementations in collect method. But I think there is only CopyOnWriteArrayList List implementation is present in java.util.concurrent. We could use various queue implementation here but those will not be like list. What I mean here is that we can workaround to get the list.

Could you please guide me what is the best way if I want the output as list?

Note: I could not find any other post related to this, any reference would be helpful.

like image 559
Vipul Goyal Avatar asked May 20 '17 08:05

Vipul Goyal


People also ask

Which method defined by collection is used to obtain a parallel stream?

To create a parallel stream from a Collection use the parallelStream() method.

Can we perform stream operations in Java 8 with a thread pool?

Overview. Java 8 introduced the concept of Streams as an efficient way of carrying out bulk operations on data. And parallel Streams can be obtained in environments that support concurrency. These streams can come with improved performance – at the cost of multi-threading overhead.


2 Answers

The Collection object used to receive the data being collected does not need to be concurrent. You can give it a simple ArrayList.

That is because the collection of values from a parallel stream is not actually collected into a single Collection object. Each thread will collect their own data, and then all sub-results will be merged into a single final Collection object.

This is all well-documented in the Collector javadoc, and the Collector is the parameter you're giving to the collect() method:

<R,A> R collect(Collector<? super T,A,R> collector)
like image 112
Andreas Avatar answered Oct 25 '22 23:10

Andreas


But there is no option that I can see to collect from parallel stream in thread safe way to provide list as output. This is entirely wrong.

The whole point in streams is that you can use a non-thread safe Collection to achieve perfectly valid thread-safe results. This is because of how streams are implemented (and this was a key part of the design of streams). You could see that a Collector defines a method supplier that at each step will create a new instance. Those instances will be merged between them.

So this is perfectly thread safe:

 Stream.of(1,2,3,4).parallel()
          .collect(Collectors.toList());

Since there are 4 elements in this stream, there will be 4 instances of ArrayList created that will be merged at the end to a single result (assuming at least 4 CPU cores)

On the other side methods like toConcurrent generate a single result container and all threads will put their result into it.

like image 15
Eugene Avatar answered Oct 25 '22 22:10

Eugene