Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many data structures are created when collecting with java parallel stream?

Stream<String> stream = Stream.of("w", "o", "l", "f").parallel();
Set<String> set = stream.collect(TreeSet::new,
        Set::add,
        Set::addAll);
System.out.println(set);  // [f, l, o, w]

How many TreeSet objects will be created upon executing the code above?

If I grasp this correctly, there will be two TreeSet objects created: one by the accumulator and another by the combiner.

like image 226
sjk Avatar asked Oct 19 '25 02:10

sjk


1 Answers

You can easily test your hypothesis by replacing the TreeSet implementation with something you can easily track. A simple example:

package com.livanov.playground;

import org.junit.Test;

import java.util.Set;
import java.util.TreeSet;
import java.util.stream.Stream;

public class MyTest {
    @Test
    public void asd() {
        Stream<String> stream = Stream.of("w", "o", "l", "f").parallel();
        Set<String> set = stream.collect(
                MySet::new, Set::add, Set::addAll
        );
        System.out.println(set);  // [f, l, o, w]
    }

    static class MySet extends TreeSet<String> {
        public MySet() {
            System.out.println("instantiated");
        }
    }
}

On my computer this will yield 4 times the instantiated message so 4 instances of MySet (respectively, the TreeSet in your scenario). So your hypothesis of 2 will only be situationally correct.

The reason is that you use parallel streams, which will parallelize (therefore instantiate your result object) for each thread the job is split into. And the number of threads is implementation and environment specific. You can see more info here

like image 167
LIvanov Avatar answered Oct 21 '25 16:10

LIvanov