Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is defined the combination order of the combiner of collect(supplier, accumulator, combiner)?

The Java API documentations states that the combiner parameter of the collect method must be:

an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function

A combiner is a BiConsumer<R,R> that receives two parameters of type R and returns void. But the documentation does not state if we should combine the elements into the first or the second parameter?

For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2) or m2.addAll(m1).

List<String> res = LongStream
     .rangeClosed(1, 1_000_000)
     .parallel()
     .mapToObj(n -> "" + n)
     .collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));

I know that in this case we could simply use a method handle, such as ArrayList::addAll. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.

Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?

like image 479
Miguel Gamboa Avatar asked May 29 '15 09:05

Miguel Gamboa


People also ask

What is a combination?

Combination What is a Combination? A combination is a mathematical technique that determines the number of possible arrangements in a collection of items where the order of the selection does not matter. In combinations, you can select the items in any order.

What is a combined order?

In this scenario, a combined order is created from two individual orders using the same operations and components. The confirmation and the goods issue are performed against the combined order. The goods receipt is performed against the individual orders. The costs of the combined order are settled to the individual orders

How to combine two production orders into one?

Combination of Original production orders into combined order ( Tcode : MILL_OC) In standard MILL_OC is the tcode used for combining the production orders. More than two production orders also can be combined. Select the plant, selection profile and order numbers which needs to be combined as shown below.

What is the main function of Combiner?

The main function of a Combiner is to summarize the map output records with the same key. The output (key-value collection) of the combiner will be sent over the network to the actual Reducer task as input.


1 Answers

Of course, it matters, as when you use m2.addAll(m1) instead of m1.addAll(m2), it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.

There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>, in other words can’t do anything else than storing the element of type T, provided as second argument, into the container of type R, provided as first argument.

If you look at the documentation of Collector, which uses a BinaryOperator as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:

The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1 and t2, the results r1 and r2 in the computation below must be equivalent:

A a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1);  // result without splitting

A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting

So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.


Now, the three-arg version of Stream.collect has a slightly different signature, using a BiConsumer as combiner exactly for supporting method references like ArrayList::addAll. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.

But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3)) doesn’t work if combiner is a BiConsumer


The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says:

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.

like image 136
Holger Avatar answered Sep 19 '22 17:09

Holger