We know Java 8 introduces a new Stream API and java.util.stream.Collector
is the interface to define how to aggregate/collect the data stream.
However, the Collector interface is designed like this:
public interface Collector<T, A, R> { Supplier<A> supplier(); BiConsumer<A, T> accumulator(); BinaryOperator<A> combiner(); Function<A, R> finisher(); }
Why is it not designed like the following?
public interface Collector<T, A, R> { A supply(); void accumulate(A accumulator, T value); A combine(A left, A right); R finish(A accumulator); }
The latter one is much easier to implement. What were the consideration to design it as the former one?
collect() is one of the Java 8's Stream API's terminal methods. It allows us to perform mutable fold operations (repackaging elements to some data structures and applying some additional logic, concatenating them, etc.) on data elements held in a Stream instance.
A Collector is a mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed.
The toList() method of Collectors Class is a static (class) method. It returns a Collector Interface that gathers the input data onto a new list. This method never guarantees type, mutability, serializability, or thread-safety of the returned list but for more control toCollection(Supplier) method can be used.
groupingBy. Returns a Collector implementing a cascaded "group by" operation on input elements of type T , grouping elements according to a classification function, and then performing a reduction operation on the values associated with a given key using the specified downstream Collector .
Actually it was originally designed similarly to what you propose. See the early implementation in project lambda repository (makeResult
is now supplier
). It was later updated to the current design. I believe, the rationale of such update is to simplify collector combinators. I did not find any specific discussion on this topic, but my guess is supported by the fact that mapping
collector appeared in the same changeset. Consider the implementation of Collectors.mapping
:
public static <T, U, A, R> Collector<T, ?, R> mapping(Function<? super T, ? extends U> mapper, Collector<? super U, A, R> downstream) { BiConsumer<A, ? super U> downstreamAccumulator = downstream.accumulator(); return new CollectorImpl<>(downstream.supplier(), (r, t) -> downstreamAccumulator.accept(r, mapper.apply(t)), downstream.combiner(), downstream.finisher(), downstream.characteristics()); }
This implementation needs to redefine accumulator
function only, leaving supplier
, combiner
and finisher
as is, so you don't have additional indirection when calling supplier
, combiner
or finisher
: you just call directly the functions returned by the original collector. It's even more important with collectingAndThen
:
public static<T,A,R,RR> Collector<T,A,RR> collectingAndThen(Collector<T,A,R> downstream, Function<R,RR> finisher) { // ... some characteristics transformations ... return new CollectorImpl<>(downstream.supplier(), downstream.accumulator(), downstream.combiner(), downstream.finisher().andThen(finisher), characteristics); }
Here only finisher
is changed, but original supplier
, accumulator
and combiner
are used. As accumulator
is called for every element, reducing the indirection could be pretty important. Try to rewrite mapping
and collectingAndThen
with your proposed design and you will see the problem. New JDK-9 collectors like filtering
and flatMapping
also benefit from current design.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With