Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping Java8 stream without collecting it

Is there any way in Java 8 to group the elements in a java.util.stream.Stream without collecting them? I want the result to be a Stream again. Because I have to work with a lot of data or even infinite streams, I cannot collect the data first and stream the result again.

All elements that need to be grouped are consecutive in the first stream. Therefore I like to keep the stream evaluation lazy.

like image 575
Matthias Wimmer Avatar asked Aug 18 '16 08:08

Matthias Wimmer


People also ask

What is the difference between collections and stream in java8?

Differences between a Stream and a Collection: A stream does not store data. An operation on a stream does not modify its source, but simply produces a result. Collections have a finite size, but streams do not.

Are Java 8 streams lazy?

The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness. In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance.

How does Collector groupingBy work?

The groupingBy() method of Collectors class in Java are used for grouping objects by some property and storing results in a Map instance. In order to use it, we always need to specify a property by which the grouping would be performed. This method provides similar functionality to SQL's GROUP BY clause.

Are streams better than for loops?

Conclusion: If you have a small list; for loops perform better, if you have a huge list; a parallel stream will perform better. And since parallel streams have quite a bit of overhead, it is not advised to use these unless you are sure it is worth the overhead.


1 Answers

There's no way to do it using standard Stream API. In general you cannot do it as it's always possible that new item will appear in future which belongs to any of already created groups, so you cannot pass your group to downstream analysis until you process all the input.

However if you know in advance that items to be grouped are always adjacent in input stream, you can solve your problem using third-party libraries enhancing Stream API. One of such libraries is StreamEx which is free and written by me. It contains a number of "partial reduction" operators which collapse adjacent items into single based on some predicate. Usually you should supply a BiPredicate which tests two adjacent items and returns true if they should be grouped together. Some of partial reduction operations are listed below:

  • collapse(BiPredicate): replace each group with the first element of the group. For example, collapse(Objects::equals) is useful to remove adjacent duplicates from the stream.
  • groupRuns(BiPredicate): replace each group with the List of group elements (so StreamEx<T> is converted to StreamEx<List<T>>). For example, stringStream.groupRuns((a, b) -> a.charAt(0) == b.charAt(0)) will create stream of Lists of strings where each list contains adjacent strings started with the same letter.

Other partial reduction operations include intervalMap, runLengths() and so on.

All partial reduction operations are lazy, parallel-friendly and quite efficient.

Note that you can easily construct a StreamEx object from regular Java 8 stream using StreamEx.of(stream). Also there are methods to construct it from array, Collection, Reader, etc. The StreamEx class implements Stream interface and 100% compatible with standard Stream API.

like image 95
Tagir Valeev Avatar answered Oct 07 '22 10:10

Tagir Valeev