Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to operate on each List from a grouping by collector without an intermediate map being created?

I have the following code that does a group by on a List, and then operates on each grouped List in turn converting it to a single item:

Map<Integer, List<Record>> recordsGroupedById = myList.stream()
    .collect(Collectors.groupingBy(r -> r.get("complex_id")));

List<Complex> whatIwant = recordsGroupedById.values().stream().map(this::toComplex)
    .collect(Collectors.toList());

The toComplex function looks like:

Complex toComplex(List<Record> records);

I have the feeling I can do this without creating the intermediate map, perhaps using reduce. Any ideas?

The input stream is ordered with the elements I want grouped sequentially in the stream. Within a normal loop construct I'd be able to determine when the next group starts and create a "Complex" at that time.

like image 342
john16384 Avatar asked Dec 15 '22 09:12

john16384


2 Answers

Create a collector that combines groupingBy and your post-processing function with collectingAndThen.

Map<Integer, Complex> map = myList.stream()
    .collect(collectingAndThen(groupingBy(r -> r.get("complex_id"), 
                               Xxx::toComplex));

If you just want a Collection<Complex> here, you can then ask the map for its values().

like image 125
Brian Goetz Avatar answered Apr 13 '23 00:04

Brian Goetz


Well you can avoid Map (honestly!) and do everything in single pipeline using my StreamEx library:

List<Complex> result = StreamEx.of(myList)
        .sortedBy(r -> r.get("complex_id"))
        .groupRuns((r1, r2) -> r1.get("complex_id").equals(r2.get("complex_id")))
        .map(this::toComplex)
        .toList();

Here we first sort input by complex_id, then use groupRuns custom intermediate operation which groups adjacent stream element to the List if the given BiPredicate applied to two adjacent elements returns true. Then you have a stream of lists which is mapped to stream of Complex objects and finally collected to the list.

There are actually no intermediate maps and groupRuns is actually lazy (in sequential mode it keeps no more than one intermediate List at a time), it also parallelizes well. On the other hand my tests show that for unsorted input such solution is slower than groupingBy-based as it involves sorting the whole input. And of course sortedBy (which is just a shortcut for sorted(Comparator.comparing(...))) takes intermediate memory to store the input. If your input is already sorted (or at least partially sorted, so TimSort can perform fast), then such solution usually faster than groupingBy.

like image 26
Tagir Valeev Avatar answered Apr 13 '23 00:04

Tagir Valeev