Suppose I have a class as
Class Person {
String name;
String uid;
String phone;
}
I am trying to group by all the fields of the class. How do i use parallel streams in JAVA 8 to convert a
List<Person> into Map<String,Set<Person>>
where the key of the map is the value of each field in the class . JAVA 8 the following example groups by a single field, how can i do it for all fields of a class into a single Map?
ConcurrentMap<Person.Sex, List<Person>> byGender =
roster
.parallelStream()
.collect(
Collectors.groupingByConcurrent(Person::getGender));
Parallel Streams can actually slow you down It breaks them into subproblems which then run on separate threads for processing, these can go to different cores and then get combined when they're done. This all happens under the hood using the fork/join framework.
Java Parallel Streams is a feature of Java 8 and higher, meant for utilizing multiple cores of the processor. Normally any java code has one stream of processing, where it is executed sequentially.
To solve this issue, you can create own thread pool while processing the stream. ForkJoinPool fjp = new ForkJoinPool(parallelism); This will create ForkJoinPool with target parallelism level. If you don't pass parallelism, it will equal to the number of processors by default.
You can do that by using the of
static factory method from Collector
:
Map<String, Set<Person>> groupBy = persons.parallelStream()
.collect(Collector.of(
ConcurrentHashMap::new,
( map, person ) -> {
map.computeIfAbsent(person.name, k -> new HashSet<>()).add(person);
map.computeIfAbsent(person.uid, k -> new HashSet<>()).add(person);
map.computeIfAbsent(person.phone, k -> new HashSet<>()).add(person);
},
( a, b ) -> {
b.forEach(( key, set ) -> a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
return a;
}
));
As Holger in the comments suggested, following approach can be preferred over the above one:
Map<String, Set<Person>> groupBy = persons.parallelStream()
.collect(HashMap::new, (m, p) -> {
m.computeIfAbsent(p.name, k -> new HashSet<>()).add(p);
m.computeIfAbsent(p.uid, k -> new HashSet<>()).add(p);
m.computeIfAbsent(p.phone, k -> new HashSet<>()).add(p);
}, (a, b) -> b.forEach((key, set) -> {
a.computeIfAbsent(key, k -> new HashSet<>()).addAll(set));
});
It uses the overloaded collect
method which acts identical to my suggested statement above.
You can either chain your grouping collectors which would give you a multi-level map. However, this is not ideal if you want to group by say more than 2 fields.
The better option would be to override the equals
and hashcode
methods within your Person
class to define the equality of two given objects which in this case would be all the said fields. Then you can group by Person
i.e groupingByConcurrent(Function.identity())
in which case you'll end up with:
ConcurrentMap<Person, List<Person>> resultSet = ....
Example:
class Person {
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Person person = (Person) o;
if (name != null ? !name.equals(person.name) : person.name != null) return false;
if (uid != null ? !uid.equals(person.uid) : person.uid != null) return false;
return phone != null ? phone.equals(person.phone) : person.phone == null;
}
@Override
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + (uid != null ? uid.hashCode() : 0);
result = 31 * result + (phone != null ? phone.hashCode() : 0);
return result;
}
private String name;
private String uid; // these should be private, don't expose
private String phone;
// getters where necessary
// setters where necessary
}
then:
ConcurrentMap<Person, List<Person>> resultSet = list.parallelStream()
.collect(Collectors.groupingByConcurrent(Function.identity()));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With