Is there any way in Java 8 to group the elements in a java.util.stream.Stream
without collecting them? I want the result to be a Stream
again. Because I have to work with a lot of data or even infinite streams, I cannot collect the data first and stream the result again.
All elements that need to be grouped are consecutive in the first stream. Therefore I like to keep the stream evaluation lazy.
Differences between a Stream and a Collection: A stream does not store data. An operation on a stream does not modify its source, but simply produces a result. Collections have a finite size, but streams do not.
The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness. In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance.
The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.
Conclusion: If you have a small list; for loops perform better, if you have a huge list; a parallel stream will perform better. And since parallel streams have quite a bit of overhead, it is not advised to use these unless you are sure it is worth the overhead.
There's no way to do it using standard Stream API. In general you cannot do it as it's always possible that new item will appear in future which belongs to any of already created groups, so you cannot pass your group to downstream analysis until you process all the input.
However if you know in advance that items to be grouped are always adjacent in input stream, you can solve your problem using third-party libraries enhancing Stream API. One of such libraries is StreamEx which is free and written by me. It contains a number of "partial reduction" operators which collapse adjacent items into single based on some predicate. Usually you should supply a BiPredicate
which tests two adjacent items and returns true if they should be grouped together. Some of partial reduction operations are listed below:
collapse(BiPredicate)
: replace each group with the first element of the group. For example, collapse(Objects::equals)
is useful to remove adjacent duplicates from the stream.groupRuns(BiPredicate)
: replace each group with the List of group elements (so StreamEx<T>
is converted to StreamEx<List<T>>
). For example, stringStream.groupRuns((a, b) -> a.charAt(0) == b.charAt(0))
will create stream of Lists of strings where each list contains adjacent strings started with the same letter.Other partial reduction operations include intervalMap
, runLengths()
and so on.
All partial reduction operations are lazy, parallel-friendly and quite efficient.
Note that you can easily construct a StreamEx
object from regular Java 8 stream using StreamEx.of(stream)
. Also there are methods to construct it from array, Collection, Reader, etc. The StreamEx
class implements Stream
interface and 100% compatible with standard Stream API.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With