Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Streams GroupingBy and filtering by count (similar to SQL's HAVING)

Do Java (9+) streams support a HAVING clause similar to SQL? Use case: grouping and then dropping all groups with certain count. Is it possible to write the following SQL clause as Java stream?

GROUP BY id
HAVING COUNT(*) > 5

The closest I could come up with was:

input.stream()
        .collect(groupingBy(x -> x.id()))
        .entrySet()
        .stream()
        .filter(entry -> entry.getValue().size() > 5)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

but extracting the entrySet of the grouped result to collect twice feels strange and especially the terminal collect call is basically mapping a map to itself.

I see that there are collectingAndThen and filtering collectors, but I don't know if they would solve my problem (or rather how to apply them correctly).

Is there a better (more idiomatic) version of the above, or am I stuck with collecting to an intermediate map, filtering that and then collecting to the final map?

like image 812
knittl Avatar asked Apr 23 '20 20:04

knittl


2 Answers

The operation has to be performed after the grouping in general, as you need to fully collect a group before you can determine whether it fulfills the criteria.

Instead of collecting a map into another, similar map, you can use removeIf to remove non-matching groups from the result map and inject this finishing operation into the collector:

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(collectingAndThen(groupingBy(x -> x.id(), HashMap::new, toList()),
            m -> {
                m.values().removeIf(l -> l.size() <= 5);
                return m;
            }));

Since the groupingBy(Function) collector makes no guarantees regarding the mutability of the created map, we need to specify a supplier for a mutable map, which requires us to be explicit about the downstream collector, as there is no overloaded groupingBy for specifying only function and map supplier.

If this is a recurring task, we can make a custom collector improving the code using it:

public static <T,K,V> Collector<T,?,Map<K,V>> having(
                      Collector<T,?,? extends Map<K,V>> c, BiPredicate<K,V> p) {
    return collectingAndThen(c, in -> {
        Map<K,V> m = in;
        if(!(m instanceof HashMap)) m = new HashMap<>(m);
        m.entrySet().removeIf(e -> !p.test(e.getKey(), e.getValue()));
        return m;
    });
}

For higher flexibility, this collector allows an arbitrary map producing collector but since this does not enforce a map type, it will enforce a mutable map afterwards, by simply using the copy constructor. In practice, this won’t happen, as the default is to use a HashMap. It also works when the caller explicitly requests a LinkedHashMap to maintain the order. We could even support more cases by changing the line to

if(!(m instanceof HashMap || m instanceof TreeMap
  || m instanceof EnumMap || m instanceof ConcurrentMap)) {
    m = new HashMap<>(m);
}

Unfortunately, there is no standard way to determine whether a map is mutable.

The custom collector can now be used nicely as

Map<KeyType, List<ElementType>> result =
    input.stream()
        .collect(having(groupingBy(x -> x.id()), (key,list) -> list.size() > 5));
like image 116
Holger Avatar answered Sep 24 '22 13:09

Holger


The only way I am aware of is to use Collectors.collectingAndThen with the same implementation inside the finisher function:

Map<Integer, List<Item>> a = input.stream().collect(Collectors.collectingAndThen(
        Collectors.groupingBy(Item::id),
        map -> map.entrySet().stream()
                             .filter(e -> e.getValue().size() > 5)
                             .collect(Collectors.toMap(Entry::getKey, Entry::getValue))));
like image 30
Nikolas Charalambidis Avatar answered Sep 21 '22 13:09

Nikolas Charalambidis