The Java API documentations states that the <code>combiner</code> parameter of the <code>collect</code> method must be: <blockquote> an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function </blockquote> A <code>combiner</code> is a <code>BiConsumer<R,R></code> that receives two parameters of type <code>R</code> and returns <code>void</code>. But the documentation does not state if we should combine the elements into the first or the second parameter? For instance the following examples may give different results, depending on the order of combination be: <code>m1.addAll(m2)</code> or <code>m2.addAll(m1)</code>. <pre class="prettyprint"><code>List<String> res = LongStream .rangeClosed(1, 1_000_000) .parallel() .mapToObj(n -> "" + n) .collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2)); </code></pre> I know that in this case we could simply use a method handle, such as <code>ArrayList::addAll</code>. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel. Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?

Of course, it matters, as when you use <code>m2.addAll(m1)</code> instead of <code>m1.addAll(m2)</code>, it doesn’t just change the order of elements, but completely breaks the operation. Since a <code>BiConsumer</code> doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss. There is a hint if you look at the accumulator function which has the type <code>BiConsumer<R,? super T></code>, in other words can’t do anything else than storing the element of type <code>T</code>, provided as second argument, into the container of type <code>R</code>, provided as first argument. If you look at the documentation of <code>Collector</code>, which uses a <code>BinaryOperator</code> as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find: <blockquote> The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements <code>t1</code> and <code>t2</code>, the results <code>r1</code> and <code>r2</code> in the computation below must be equivalent: <pre class="prettyprint"><code>A a1 = supplier.get(); accumulator.accept(a1, t1); accumulator.accept(a1, t2); R r1 = finisher.apply(a1); // result without splitting A a2 = supplier.get(); accumulator.accept(a2, t1); A a3 = supplier.get(); accumulator.accept(a3, t2); R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting </code></pre> </blockquote> So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result. <hr> Now, the three-arg version of <code>Stream.collect</code> has a slightly different signature, using a <code>BiConsumer</code> as combiner exactly for supporting method references like <code>ArrayList::addAll</code>. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify. But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual <code>Stream.collect</code>’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that <code>finisher.apply(combiner.apply(a2, a3))</code> doesn’t work if <code>combiner</code> is a <code>BiConsumer</code>… <hr> The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says: <blockquote> combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container. </blockquote>

Where is defined the combination order of the combiner of collect(supplier, accumulator, combiner)?

Tags:

java

java-8

java-stream

The Java API documentations states that the combiner parameter of the collect method must be:

an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function

A combiner is a BiConsumer<R,R> that receives two parameters of type R and returns void. But the documentation does not state if we should combine the elements into the first or the second parameter?

For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2) or m2.addAll(m1).

List<String> res = LongStream
     .rangeClosed(1, 1_000_000)
     .parallel()
     .mapToObj(n -> "" + n)
     .collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));

I know that in this case we could simply use a method handle, such as ArrayList::addAll. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.

Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?

479

asked May 29 '15 09:05

Miguel Gamboa

1 Answers

Of course, it matters, as when you use m2.addAll(m1) instead of m1.addAll(m2), it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.

There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>, in other words can’t do anything else than storing the element of type T, provided as second argument, into the container of type R, provided as first argument.

If you look at the documentation of Collector, which uses a BinaryOperator as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:

The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1 and t2, the results r1 and r2 in the computation below must be equivalent:
A a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1);  // result without splitting

A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting

So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.

Now, the three-arg version of Stream.collect has a slightly different signature, using a BiConsumer as combiner exactly for supporting method references like ArrayList::addAll. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.

But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3)) doesn’t work if combiner is a BiConsumer…

The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says:

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.

136

answered Sep 19 '22 17:09

Holger

Related questions
                            
                                move (copy) IMAPMessage to another folder on the mail server
                            
                                Spring 3.5 how to add HttpSessionEventPublisher to my boot configuration
                            
                                Why the scheduleAtFixedRate - scheduleWithFixedDelay methods do not use Callable<V>
                            
                                Get attribute from ServletContext on JSP page
                            
                                What is the default MessageFactory for Log4J
                            
                                How to left-pad an integer with spaces?
                            
                                Files.lines to skip broken lines in Java8
                            
                                Spring Dynamic Modules - is it alive project?
                            
                                Builder pattern with Jackson for deserializing
                            
                                Cannot import java classes under src/main/java into src/test/java in eclipse
                            
                                How to convert a JSON string to a Map<String, Set<String>> with Jackson JSON
                            
                                Cannot install Android Studio Bundle? "Unable To Elevate Error Message"
                            
                                Embed a JavaFX application in a HTML webpage
                            
                                What does it mean if a variable has the name "this$0" in IntelliJ IDEA while debugging Java?
                            
                                Refreshing gradle in IntelliJ causes source folder structure to change
                            
                                Convert JSON to Android Bundle [closed]
                            
                                Maven - java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
                            
                                Copy properties from one bean to another (not the same class) recursively (including nested beans) [duplicate]
                            
                                Mock objects in Junit test gives NoClassDefFoundError
                            
                                Implementing CrudRepository in Spring. What's the best design I should follow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With