I wanted to know why we classify certain collectors as "downstream"? Is there an upstream Collector then? Please note that this is not about usage, but trying to understand the logic behind the term "downstream". To me, when you normally deal with stream API usage, all streams down the builder chain looks like they are downstream only.
List<String> list = List.of("AAA","B","CCCCC","DDD", "FFFFFF", "AAA");
List<Integer> res =
list.stream()
.collect(
Collectors.mapping(s -> s.length(), // string -> int
Collectors.toList())); // downstreaming
In the above code, Collectors.toList()
is regarded as downstream.
Returns a Collector which performs a reduction of its input elements under a specified BinaryOperator using the provided identity. API Note: The reducing() collectors are most useful when used in a multi-level reduction, downstream of groupingBy or partitioningBy . To perform a simple reduction on a stream, use Stream.
Collectors is a final class that extends Object class. It provides reduction operations, such as accumulating elements into collections, summarizing elements according to various criteria, etc. It returns a Collector that produces the arithmetic mean of a double-valued function applied to the input elements.
Difference Between Collections Vs Streams In Java : Collections are mainly used to store and group the data. Streams are mainly used to perform operations on data. You can add or remove elements from collections. You can't add or remove elements from streams.
There are two variants of collect () method in Java Stream API- <R,A> R collect (Collector<? super T,A,R> collector) - Performs a mutable reduction operation on the elements of this stream using a Collector. supplier - a function that creates a new result container.
Streams are not modifiable i.e one can’t add or remove elements from streams. These are modifiable i.e one can easily add to or remove elements from collections. Streams are iterated internally by just mentioning the operations. Collections are iterated externally using loops.
To do so, we use filter () to apply the filter check of temperature, we use map () to transform the city name and use collect () to collect these city names. Now this collect () method is basically used for collecting the elements passed though stream and its various functions and return a List instance.
With Java 8 came one of the greatest additions to Java: the Stream API. It made processing a stream of data very convenient by allowing us to chain operations together, lazily, and perform the actual data processing by ending a fluent call with a terminal operation.
The term downstream in the documentation refers to one Collector accepting a second Collector as an argument. The argument is applied downstream (after) the Collector that accepts it. In other words, the downstream Collector is applied to the result of the upstream Collector.
In your example, Collectors.toList
is downstream from Collectors.mapping
.
I often imagine the stream API as building a production line of a product. There are raw materials coming from somewhere (ArrayList.stream
, IntStream.range
, Stream.of
, whatever), on a conveyer belt, and then with intermediate methods, the materials get transformed (map
/flatMap
etc) and filtered (filter
/limit
etc) and finally they reach the end of the line, where they get assembled into one final product (collect
)*.
Collector
s are "machines" that build different final products aforementioned. toList
builds a list. toSet
builds a Set
etc. However, other collectors doesn't fully build the big thing, e.g. groupingBy
. groupingBy
only groups the materials by a key, and then spits the items out again, as groups, back on the conveyor belt. These collectors need another collector down the production line (aka down the stream) to continue building the final product.
mapping
is another one of those collectors that doesn't completely build the final product. It merely transforms the materials and spits them out again, which is kind of like map
. It's usefulness comes when you want to, e.g. transform the groups spitted out from a groupingBy
. i.e. It's mostly useful when you use it as the downstream of another collector.
Is there an upstream Collector then?
Following the production line analogy, the relationship is two way: toList
is the downstream of mapping
, so mapping
is the upstream of toList
. In official documentation though. This word isn't mentioned much. I only found it in peek
.
*There are other terminal operations, but let's focus on collect
, since this is what the question is about.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With