Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Stream API - Selecting only values after Collectors.groupingBy(..)

Say I have the following collection of Student objects which consist of Name(String), Age(int) and City(String).

I am trying to use Java's Stream API to achieve the following sql-like behavior:

SELECT MAX(age)
FROM Students
GROUP BY city

Now, I found two different ways to do so:

final List<Integer> variation1 =
            students.stream()
                    .collect(Collectors.groupingBy(Student::getCity, Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge())))
                    .values()
                    .stream()
                    .filter(Optional::isPresent)
                    .map(Optional::get)
                    .map(Student::getAge)
                    .collect(Collectors.toList());

And the other one:

final Collection<Integer> variation2 =
            students.stream()
                    .collect(Collectors.groupingBy(Student::getCity,
                            Collectors.collectingAndThen(Collectors.maxBy((s1, s2) -> s1.getAge() - s2.getAge()),
                                    optional -> optional.get().getAge())))
                    .values();

In both ways, one has to .values() ... and filter the empty groups returned from the collector.

Is there any other way to achieve this required behavior?

These methods remind me of over partition by sql statements...

Thanks


Edit: All the answers below were really interesting, but unfortunately this is not what I was looking for, since what I try to get is just the values. I don't need the keys, just the values.

like image 265
Ghost93 Avatar asked Feb 29 '16 22:02

Ghost93


People also ask

How do you use groupingBy collectors?

The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.

Which is the correct way of obtaining a Stream from the collection?

You obtain a stream from a collection by calling the stream() method of the given collection. Here is an example of obtaining a stream from a collection: List<String> items = new ArrayList<String>(); items.

Does streams in Java 8 have limited storage?

No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.

What are the advantages of Stream API over collections API?

The stream API allows you to perform operations on collections without external iteration. In this case, we're performing a filter operation which will filter the input collection based on the condition specified.


3 Answers

The second approach calls get() on an Optional; this is usually a bad idea as you don't know if the optional will be empty or not (use orElse(), orElseGet(), orElseThrow() methods instead). While you might argue that in this case there always be a value since you generate the values from the student list itself, this is something to keep in mind.

Based on that, you might turn the variation 2 into:

final Collection<Integer> variation2 =
     students.stream()
             .collect(collectingAndThen(groupingBy(Student::getCity,
                                                   collectingAndThen(
                                                      mapping(Student::getAge, maxBy(naturalOrder())),
                                                      Optional::get)), 
                                        Map::values));

Although it really starts to be difficult to read, I'll probably use the variant 1:

final List<Integer> variation1 =
        students.stream()
            .collect(groupingBy(Student::getCity,
                                mapping(Student::getAge, maxBy(naturalOrder()))))
            .values()
            .stream()
            .map(Optional::get)
            .collect(toList());
like image 133
Alexis C. Avatar answered Sep 28 '22 09:09

Alexis C.


Do not always stick with groupingBy. Sometimes toMap is the thing you need:

Collection<Integer> result = students.stream()
    .collect(Collectors.toMap(Student::getCity, Student::getAge, Integer::max))
    .values();

Here you just create a Map where keys are cities and values are ages. In case when several students have the same city, merge function is used which just selects maximal age here. It's faster and cleaner.

like image 42
Tagir Valeev Avatar answered Sep 28 '22 09:09

Tagir Valeev


As addition to Tagir’s great answer using toMap instead of groupingBy, here the short solution, if you want to stick to groupingBy:

Collection<Integer> result = students.stream()
    .collect(Collectors.groupingBy(Student::getCity,
                 Collectors.reducing(-1, Student::getAge, Integer::max)))
    .values();

Note that this three arg reducing collector already performs a mapping operation, so we don’t need to nest it with a mapping collector, further, providing an identity value avoids dealing with Optional. Since ages are always positive, providing -1 is sufficient and since a group will always have at least one element, the identity value will never show up as a result.

Still, I think Tagir’s toMap based solution is preferable in this scenario.


The groupingBy based solution becomes more interesting when you want to get the actual students having the maximum age, e.g

Collection<Student> result = students.stream().collect(
   Collectors.groupingBy(Student::getCity, Collectors.reducing(null, BinaryOperator.maxBy(
     Comparator.nullsFirst(Comparator.comparingInt(Student::getAge)))))
).values();

well, actually, even this can also be expressed using the toMap collector:

Collection<Student> result = students.stream().collect(
    Collectors.toMap(Student::getCity, Function.identity(),
        BinaryOperator.maxBy(Comparator.comparingInt(Student::getAge)))
).values();

You can express almost everything with both collectors, but groupingBy has the advantage on its side when you want to perform a mutable reduction on the values.

like image 32
Holger Avatar answered Sep 28 '22 08:09

Holger