Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Parallel Stream Concurrent Grouping

Suppose I have a class as

Class Person {
  String name;
  String uid;
  String phone;
}

I am trying to group by all the fields of the class. How do i use parallel streams in JAVA 8 to convert a

List<Person> into Map<String,Set<Person>>

where the key of the map is the value of each field in the class . JAVA 8 the following example groups by a single field, how can i do it for all fields of a class into a single Map?

ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
    .parallelStream()
    .collect(
        Collectors.groupingByConcurrent(Person::getGender));
like image 749
user3665053 Avatar asked Dec 16 '17 18:12

user3665053


People also ask

What is the disadvantage of parallel stream in Java 8?

Parallel Streams can actually slow you down It breaks them into subproblems which then run on separate threads for processing, these can go to different cores and then get combined when they're done. This all happens under the hood using the fork/join framework.

Is Java 8 stream parallel?

Java Parallel Streams is a feature of Java 8 and higher, meant for utilizing multiple cores of the processor. Normally any java code has one stream of processing, where it is executed sequentially.

How can we control parallel processing of a stream in Java 8?

To solve this issue, you can create own thread pool while processing the stream. ForkJoinPool fjp = new ForkJoinPool(parallelism); This will create ForkJoinPool with target parallelism level. If you don't pass parallelism, it will equal to the number of processors by default.


2 Answers

You can do that by using the of static factory method from Collector:

Map<String, Set<Person>> groupBy = persons.parallelStream()
    .collect(Collector.of(
        ConcurrentHashMap::new,
        ( map, person ) -> {
            map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
            map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
        },
        ( a, b ) -> {
            b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
            return a;
        }
    ));

As Holger in the comments suggested, following approach can be preferred over the above one:

Map<String, Set<Person>> groupBy = persons.parallelStream()
     .collect(HashMap::new, (m, p) -> { 
         m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p); 
         m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p); 
     }, (a, b) -> b.forEach((key, set) -> {
         a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
     });

It uses the overloaded collect method which acts identical to my suggested statement above.

like image 184
Lino Avatar answered Oct 12 '22 02:10

Lino


You can either chain your grouping collectors which would give you a multi-level map. However, this is not ideal if you want to group by say more than 2 fields.

The better option would be to override the equals and hashcode methods within your Person class to define the equality of two given objects which in this case would be all the said fields. Then you can group by Person i.e groupingByConcurrent(Function.identity()) in which case you'll end up with:

ConcurrentMap<Person, List<Person>> resultSet = ....

Example:

class Person {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Person person = (Person) o;

        if (name != null ? !name.equals(person.name) : person.name != null) return false;
        if (uid != null ? !uid.equals(person.uid) : person.uid != null) return false;
        return phone != null ? phone.equals(person.phone) : person.phone == null;
    }

    @Override
    public int hashCode() {
        int result = name != null ? name.hashCode() : 0;
        result = 31 * result + (uid != null ? uid.hashCode() : 0);
        result = 31 * result + (phone != null ? phone.hashCode() : 0);
        return result;
    }

    private String name;
    private String uid; // these should be private, don't expose
    private String phone;

   // getters where necessary
   // setters where necessary
}

then:

ConcurrentMap<Person, List<Person>> resultSet = list.parallelStream()
                .collect(Collectors.groupingByConcurrent(Function.identity()));
like image 4
Ousmane D. Avatar answered Oct 12 '22 01:10

Ousmane D.