I have the following code that attempts to populate a Map from a List in a parallel fashion by going through the Java Stream API: <pre class="prettyprint"><code>class NameId {...} public class TestStream { static public void main(String[] args) { List<NameId > niList = new ArrayList<>(); niList.add(new NameId ("Alice", "123456")); niList.add(new NameId ("Bob", "223456")); niList.add(new NameId ("Carl", "323456")); Stream<NameId> niStream = niList.parallelStream(); Map<String, String> niMap = niStream.collect(Collectors.toMap(NameId::getName, NameId::getId)); } } </code></pre> How do I know if the map is populated using multiple threads, i.e. in parallel? Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? Is this a reasonable way to parallelize the population of a map? How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?

<blockquote> How do I know if the map is populated using multiple threads, i.e. in parallel? </blockquote> It is hard to tell. If your code is going surprisingly slowly it could be because you are trying to use multiple threads. <blockquote> Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? </blockquote> This would help make the parallel more efficient or put another way, a little less inefficient. <blockquote> Is this a reasonable way to parallelize the population of a map? </blockquote> You can do it as you suggest however you should note that the cost of starting a new thread is far more expensive than everything you are doing here so adding even one thread will slow it down a lot. <blockquote> How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)? </blockquote> The documentation says you can't know for sure. The last time I checked <code>toMap</code> was using HashMap and <code>groupingBy</code> used LinkedHashMap but you can't assume it is any particular Map.

How do I know if Java Stream collect(Collectors.toMap) is parallelized?

Tags:

java

parallel-processing

java-stream

I have the following code that attempts to populate a Map from a List in a parallel fashion by going through the Java Stream API:

class NameId {...}

public class TestStream
{
    static public void main(String[] args)
    {
        List<NameId > niList = new ArrayList<>();
        niList.add(new NameId ("Alice", "123456"));
        niList.add(new NameId ("Bob", "223456"));
        niList.add(new NameId ("Carl", "323456"));

        Stream<NameId> niStream = niList.parallelStream();
        Map<String, String> niMap = niStream.collect(Collectors.toMap(NameId::getName, NameId::getId));
    }
}

How do I know if the map is populated using multiple threads, i.e. in parallel? Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? Is this a reasonable way to parallelize the population of a map? How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?

843

asked Dec 05 '15 00:12

user1332148

2 Answers

From the Javadoc:

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are inserted into the Map in encounter order, using toConcurrentMap(Function, Function) may offer better parallel performance.

So it sounds like toConcurrentMap will parallelize the inserts.

The backing map is, by default, a HashMap. It just calls the version of toMap which takes a Supplier<M> and passes HashMap::new. (source: the source)

106

answered Sep 24 '22 23:09

Cardano

How do I know if the map is populated using multiple threads, i.e. in parallel?

It is hard to tell. If your code is going surprisingly slowly it could be because you are trying to use multiple threads.

Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap?

This would help make the parallel more efficient or put another way, a little less inefficient.

Is this a reasonable way to parallelize the population of a map?

You can do it as you suggest however you should note that the cost of starting a new thread is far more expensive than everything you are doing here so adding even one thread will slow it down a lot.

How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?

The documentation says you can't know for sure. The last time I checked toMap was using HashMap and groupingBy used LinkedHashMap but you can't assume it is any particular Map.

answered Sep 25 '22 23:09

Peter Lawrey

Related questions
                            
                                Why is "Cannot call methods on a stopped SparkContext" thrown when connecting to Spark Standalone from Java application?
                            
                                Is it possible to collect a stream into two collectors
                            
                                Out of Memory using Retrofit 2 for downloading a file
                            
                                What is a generic method and how is <T> bound in this case? [duplicate]
                            
                                How to properly set utf8 encoding with jdbc and MySQL?
                            
                                The actor pattern with Akka and long running processes
                            
                                Why can't I Throw or Catch Instances of a Generic Class in java?
                            
                                Explicit method type parameter ignored on a raw class type; compiler bug? [duplicate]
                            
                                Gmail-API JAVA Client Send Email Insufficient Permission
                            
                                How to create new instance of a class by passing Object[] instead of parameter list with reflection
                            
                                Java Reflection .getMethod int vs Integer [duplicate]
                            
                                Spring Hibernate, avoid statements registering and closing repeatively
                            
                                Enable HTTP 2.0 for Undertow in Spring Boot
                            
                                Hibernate TransientPropertyValueException When saving data
                            
                                Realm lifecycles in fragments
                            
                                Swagger not generating the REST documentation
                            
                                Is it possible to write lambda letter (λ) in IntelliJ IDEA?
                            
                                Example of jOOQ query with more than 22 columns
                            
                                Uploading multiple images with volley?
                            
                                How can I make Spring Security store the HTTP Session in a database to use the web app on multiple servers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With