Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to use parallelstream() to populate a Map in Java 8

Tags:

I have a list of 1 million objects, and I need to populate that into a Map. Now, I want to reduce the time for populating this into a Map, and for this I am planning on using Java 8 parallelstream() like this:

List<Person> list = new LinkedList<>();
Map<String, String> map = new HashMap<>();
list.parallelStream().forEach(person ->{
    map.put(person.getName(), person.getAge());
});

I want to ask is it safe to populate a Map like this through parallel threads. Isn't it possible to have concurrency issues, and some data may get lost in the Map ?

like image 813
OneMoreError Avatar asked Oct 25 '16 10:10

OneMoreError


People also ask

Is ParallelStream blocking?

An operation on a ParallelStream is still blocking and will wait for all the threads it spawned to finish. These threads are executed asynchronously (they don't wait for a previous one to finish), but that doesn't mean your whole code starts behaving asynchronously !

What is the difference between stream () and ParallelStream ()?

A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.

Is parallel stream safe?

Hence, a stream with very many elements will take a performance hit because of this. Also, lambdas which produce side effects makes parallel-running streams hazardous to thread safety.

What is the disadvantage of parallel stream in Java 8?

1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.


2 Answers

It is very safe to use parallelStream() to collect into a HashMap. However, it is not safe to use parallelStream(), forEach and a consumer adding things to a HashMap.

HashMap is not a synchronized class, and trying to put elements in it concurrently will not work properly. This is what forEach will do, it will invoke the given consumer, which puts elements into the HashMap, from multiple threads, possibly at the same time. If you want a simple code demonstrating the issue:

List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Map<Integer, Integer> map = new HashMap<>();
list.parallelStream().forEach(i -> {
    map.put(i, i);
});
System.out.println(list.size());
System.out.println(map.size());

Make sure to run it a couple of times. There's a very good chance (the joy of concurrency) that the printed map size after the operation is not 10000, which is the size of the list, but slightly less.

The solution here, as always, is not to use forEach, but to use a mutable reduction approach with the collect method and the built-in toMap:

Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i));

Use that line of code in the sample code above, and you can rest assured that the map size will always be 10000. The Stream API ensures that it is safe to collect into a non-thread safe container, even in parallel. Which also means that you don't need to use toConcurrentMap to be safe, this collector is needed if you specifically want a ConcurrentMap as result, not a general Map; but as far as thread safety is concerned with regard to collect, you can use both.

like image 162
Tunaki Avatar answered Sep 18 '22 14:09

Tunaki


HashMap isn't threadsafe, but ConcurrentHashMap is; use that instead

Map<String, String> map = new ConcurrentHashMap<>();

and your code will work as expected.


Performance comparison of forEach() vs toMap()

After JVM warm-up, with 1M elements, using parallel streams and using median timings, the forEach() version was consistently 2-3 times faster than the toMap() version.

Results were consistent between all-unique, 25% duplicate and 100% duplicate inputs.

like image 23
Bohemian Avatar answered Sep 20 '22 14:09

Bohemian