The Java API documentations states that the combiner
parameter of the collect
method must be:
an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function
A combiner
is a BiConsumer<R,R>
that receives two parameters of type R
and returns void
. But the documentation does not state if we should combine the elements into the first or the second parameter?
For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2)
or m2.addAll(m1)
.
List<String> res = LongStream
.rangeClosed(1, 1_000_000)
.parallel()
.mapToObj(n -> "" + n)
.collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));
I know that in this case we could simply use a method handle, such as ArrayList::addAll
. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.
Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?
Combination What is a Combination? A combination is a mathematical technique that determines the number of possible arrangements in a collection of items where the order of the selection does not matter. In combinations, you can select the items in any order.
In this scenario, a combined order is created from two individual orders using the same operations and components. The confirmation and the goods issue are performed against the combined order. The goods receipt is performed against the individual orders. The costs of the combined order are settled to the individual orders
Combination of Original production orders into combined order ( Tcode : MILL_OC) In standard MILL_OC is the tcode used for combining the production orders. More than two production orders also can be combined. Select the plant, selection profile and order numbers which needs to be combined as shown below.
The main function of a Combiner is to summarize the map output records with the same key. The output (key-value collection) of the combiner will be sent over the network to the actual Reducer task as input.
Of course, it matters, as when you use m2.addAll(m1)
instead of m1.addAll(m2)
, it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer
doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.
There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>
, in other words can’t do anything else than storing the element of type T
, provided as second argument, into the container of type R
, provided as first argument.
If you look at the documentation of Collector
, which uses a BinaryOperator
as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:
The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements
t1
andt2
, the resultsr1
andr2
in the computation below must be equivalent:A a1 = supplier.get(); accumulator.accept(a1, t1); accumulator.accept(a1, t2); R r1 = finisher.apply(a1); // result without splitting A a2 = supplier.get(); accumulator.accept(a2, t1); A a3 = supplier.get(); accumulator.accept(a3, t2); R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting
So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.
Now, the three-arg version of Stream.collect
has a slightly different signature, using a BiConsumer
as combiner exactly for supporting method references like ArrayList::addAll
. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.
But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect
’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3))
doesn’t work if combiner
is a BiConsumer
…
The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says:
combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With