I recently learned about streams in Java 8 and started to work with them. Now I have a question regarding the groupingBy
collector method:
Usually I work with .NET, so I compared (knowing they are not the same) Java Stream<T>
with .NET IEnumerable<T>
. Following this comparison, List<T>
stores elements and the particular Stream
/IEnumerable
applies operations. One example:
C#:
elements.Where(x => x.Value == 5).ToList();
Java:
elements.stream().filter(x -> x.getValue() == 5).collect(Collectors.toList());
In both examples, I start with a list, define operations (a filter in this example) and collect the result to store it (in a new list in this example).
Now I got a more complex case:
data.stream()
.map( ... ).filter( ... ) // Some operations
.collect(groupingBy(Chunk::getName, summingLong(Chunk::getValue)));
The result of this query is a Map<String, Long>
and I can work with this, but lets say, I want to proceed with this data instead of storing it. My current approach is trivial:
...
.collect(groupingBy(Chunk::getName, summingLong(Chunk::getValue)))
.entrySet().stream().
.map( ... ) // Do more operations
But this way, I leave the stream, store the first result in a Map and open a new stream to continue. Is there a way to group without a collector, so that I can "stay" in the stream?
peek() Usage peek()'s Javadoc page says: “This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline“.
groupingBy() method in Java 8 now permits developers to perform GROUP BY operation directly. GROUP BY is a SQL aggregate operation that is quite useful. It enables you to categorise records based on specified criteria.
The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.
Run your application with the debugger ⌃D (macOS), or Shift+F9 on Windows and Linux. Again, you can click the run icon over in the gutter on the left on line 21 if you prefer. On the right-hand side of the debug window, click the button called Trace Current Stream Chain.
You can do whatever you like in the downstream collector, as long as you can describe the operation as a Collector
. Currently, there is only an equivalent to the intermediate operation map
, the mapping
collector, but Java 9 will also add filtering
and flatMapping
(which you could also implement yourself in Java 8) and there’s already an equivalent to almost every terminal operation.
Of course, a nested appliance of collectors will look entirely different than a chain of Stream operations doing the same…
If, however, you want to process complete groups, there is no way around completing the grouping
collection first. This is not a limitation of the API, but intrinsic to the grouping operation or any operation in general, if you want to process a complete result, you’ll need to complete the operation first. Regardless of how the API looks like, e.g. you could hide the follow-up operation in the collector in a collectingAndThen
-like manner, creating and populating the Map
is unavoidable, as it’s the map doing the maintenance of the groups. The groups are determined by the keys and lookup logic of the Map
, so, e.g. using a SortedMap
with a custom comparator or an IdentityHashMap
, can change the grouping logic entirely.
As the API is right now, you can't escape it.
groupingBy
is a terminal operation (it does not return a Stream), so that operation will end the stream.
Depending on what you want later to do inside the last map operation, you could create a custom collector that will "stay" inside the stream; even if inside you would probably still gather elements into a Map.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With