I have the following code that does a group by on a List, and then operates on each grouped List in turn converting it to a single item:
Map<Integer, List<Record>> recordsGroupedById = myList.stream()
.collect(Collectors.groupingBy(r -> r.get("complex_id")));
List<Complex> whatIwant = recordsGroupedById.values().stream().map(this::toComplex)
.collect(Collectors.toList());
The toComplex
function looks like:
Complex toComplex(List<Record> records);
I have the feeling I can do this without creating the intermediate map, perhaps using reduce. Any ideas?
The input stream is ordered with the elements I want grouped sequentially in the stream. Within a normal loop construct I'd be able to determine when the next group starts and create a "Complex" at that time.
Create a collector that combines groupingBy and your post-processing function with collectingAndThen
.
Map<Integer, Complex> map = myList.stream()
.collect(collectingAndThen(groupingBy(r -> r.get("complex_id"),
Xxx::toComplex));
If you just want a Collection<Complex>
here, you can then ask the map for its values()
.
Well you can avoid Map
(honestly!) and do everything in single pipeline using my StreamEx library:
List<Complex> result = StreamEx.of(myList)
.sortedBy(r -> r.get("complex_id"))
.groupRuns((r1, r2) -> r1.get("complex_id").equals(r2.get("complex_id")))
.map(this::toComplex)
.toList();
Here we first sort input by complex_id
, then use groupRuns
custom intermediate operation which groups adjacent stream element to the List
if the given BiPredicate
applied to two adjacent elements returns true. Then you have a stream of lists which is mapped to stream of Complex
objects and finally collected to the list.
There are actually no intermediate maps and groupRuns
is actually lazy (in sequential mode it keeps no more than one intermediate List
at a time), it also parallelizes well. On the other hand my tests show that for unsorted input such solution is slower than groupingBy
-based as it involves sorting the whole input. And of course sortedBy
(which is just a shortcut for sorted(Comparator.comparing(...))
) takes intermediate memory to store the input. If your input is already sorted (or at least partially sorted, so TimSort can perform fast), then such solution usually faster than groupingBy
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With