Is it possible to operate on each List from a grouping by collector without an intermediate map being created?

Question

I have the following code that does a group by on a List, and then operates on each grouped List in turn converting it to a single item:

Map<Integer, List<Record>> recordsGroupedById = myList.stream()
    .collect(Collectors.groupingBy(r -> r.get("complex_id")));

List<Complex> whatIwant = recordsGroupedById.values().stream().map(this::toComplex)
    .collect(Collectors.toList());

The toComplex function looks like:

Complex toComplex(List<Record> records);

I have the feeling I can do this without creating the intermediate map, perhaps using reduce. Any ideas?

The input stream is ordered with the elements I want grouped sequentially in the stream. Within a normal loop construct I'd be able to determine when the next group starts and create a "Complex" at that time.

Brian Goetz · Accepted Answer

Create a collector that combines groupingBy and your post-processing function with collectingAndThen.

Map<Integer, Complex> map = myList.stream()
    .collect(collectingAndThen(groupingBy(r -> r.get("complex_id"), 
                               Xxx::toComplex));

If you just want a Collection<Complex> here, you can then ask the map for its values().

Tagir Valeev · Answer

Well you can avoid Map (honestly!) and do everything in single pipeline using my StreamEx library:

List<Complex> result = StreamEx.of(myList)
        .sortedBy(r -> r.get("complex_id"))
        .groupRuns((r1, r2) -> r1.get("complex_id").equals(r2.get("complex_id")))
        .map(this::toComplex)
        .toList();

Here we first sort input by complex_id, then use groupRuns custom intermediate operation which groups adjacent stream element to the List if the given BiPredicate applied to two adjacent elements returns true. Then you have a stream of lists which is mapped to stream of Complex objects and finally collected to the list.

There are actually no intermediate maps and groupRuns is actually lazy (in sequential mode it keeps no more than one intermediate List at a time), it also parallelizes well. On the other hand my tests show that for unsorted input such solution is slower than groupingBy-based as it involves sorting the whole input. And of course sortedBy (which is just a shortcut for sorted(Comparator.comparing(...))) takes intermediate memory to store the input. If your input is already sorted (or at least partially sorted, so TimSort can perform fast), then such solution usually faster than groupingBy.

Is it possible to operate on each List from a grouping by collector without an intermediate map being created?

Tags:

java

java-8

java-stream

john16384

2 Answers

Brian Goetz

Tagir Valeev

Recent Activity

Donate For Us

Is it possible to operate on each List from a grouping by collector without an intermediate map being created?

Tags:

java

java-8

java-stream

john16384

2 Answers

Brian Goetz

Tagir Valeev

Related questions

Recent Activity

Donate For Us