Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I know if Java Stream collect(Collectors.toMap) is parallelized?

I have the following code that attempts to populate a Map from a List in a parallel fashion by going through the Java Stream API:

class NameId {...}

public class TestStream
{
    static public void main(String[] args)
    {
        List<NameId > niList = new ArrayList<>();
        niList.add(new NameId ("Alice", "123456"));
        niList.add(new NameId ("Bob", "223456"));
        niList.add(new NameId ("Carl", "323456"));

        Stream<NameId> niStream = niList.parallelStream();
        Map<String, String> niMap = niStream.collect(Collectors.toMap(NameId::getName, NameId::getId));
    }
}

How do I know if the map is populated using multiple threads, i.e. in parallel? Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? Is this a reasonable way to parallelize the population of a map? How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?

like image 843
user1332148 Avatar asked Dec 05 '15 00:12

user1332148


People also ask

Can Java streams be parallelized easily?

Any stream in Java can easily be transformed from sequential to parallel. We can achieve this by adding the parallel method to a sequential stream or by creating a stream using the parallelStream method of a collection: List<Integer> listOfNumbers = Arrays.

Which method defined by collection is used to obtain a parallel stream?

To create a parallel stream from another stream, use the parallel() method. To create a parallel stream from a Collection use the parallelStream() method.

What is the difference between stream () and parallelStream ()?

A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.


2 Answers

From the Javadoc:

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are inserted into the Map in encounter order, using toConcurrentMap(Function, Function) may offer better parallel performance.

So it sounds like toConcurrentMap will parallelize the inserts.

The backing map is, by default, a HashMap. It just calls the version of toMap which takes a Supplier<M> and passes HashMap::new. (source: the source)

like image 106
Cardano Avatar answered Sep 24 '22 23:09

Cardano


How do I know if the map is populated using multiple threads, i.e. in parallel?

It is hard to tell. If your code is going surprisingly slowly it could be because you are trying to use multiple threads.

Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap?

This would help make the parallel more efficient or put another way, a little less inefficient.

Is this a reasonable way to parallelize the population of a map?

You can do it as you suggest however you should note that the cost of starting a new thread is far more expensive than everything you are doing here so adding even one thread will slow it down a lot.

How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?

The documentation says you can't know for sure. The last time I checked toMap was using HashMap and groupingBy used LinkedHashMap but you can't assume it is any particular Map.

like image 28
Peter Lawrey Avatar answered Sep 25 '22 23:09

Peter Lawrey