Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Purpose of third argument to 'reduce' function in Java 8 functional programming

Under what circumstances is the third argument to 'reduce' called in Java 8 streams?

The code below attempts to traverse a list of strings and add up the code point values of the first character of each. The value returned by the final lambda never seems to be used and, if you insert a println, it never seems to be invoked. The documentation describes it as a 'combiner' but I cant find more detail...

int result =   data.stream().reduce(0, (total,s) -> total + s.codePointAt(0), (a,b) -> 1000000);  
like image 327
Garth Gilmour Avatar asked Apr 02 '14 10:04

Garth Gilmour


People also ask

What is the use of reduce function in Java 8?

In Java, reducing is a terminal operation that aggregates a stream into a type or a primitive type. Java 8 provides Stream API contains set of predefined reduction operations such as average(), sum(), min(), max(), and count(). These operations return a value by combining the elements of a stream.

What is significance of reduce () in Java?

Reducing is the repeated process of combining all elements. reduce operation applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and second argument is the current stream element.

What is identity in reduce?

Identity – an element that is the initial value of the reduction operation and the default result if the stream is empty. Accumulator – a function that takes two parameters: a partial result of the reduction operation and the next element of the stream.

What is combiner In reduce?

In parallel processing we can pass combiner function as additional parameter to this method. Stream reduce() can be used to get the sum of numbers stored in collection. It can also concatenate the string data stored in collection with given separator.


2 Answers

Are you talking about this function?

reduce <U> U reduce(U identity,              BiFunction<U,? super T,U> accumulator,              BinaryOperator<U> combiner)  

Performs a reduction on the elements of this stream, using the provided identity, accumulation and combining functions. This is equivalent to:

 U result = identity;  for (T element : this stream)      result = accumulator.apply(result, element)  return result;    

but is not constrained to execute sequentially. The identity value must be an identity for the combiner function. This means that for all u, combiner(identity, u) is equal to u. Additionally, the combiner function must be compatible with the accumulator function; for all u and t, the following must hold:

 combiner.apply(u, accumulator.apply(identity, t)) ==       accumulator.apply(u, t)    

This is a terminal operation.

API Note: Many reductions using this form can be represented more simply by an explicit combination of map and reduce operations. The accumulator function acts as a fused mapper and accumulator, which can sometimes be more efficient than separate mapping and reduction, such as when knowing the previously reduced value allows you to avoid some computation. Type Parameters: U - The type of the result Parameters: identity - the identity value for the combiner function accumulator - an associative, non-interfering, stateless function for incorporating an additional element into a result combiner - an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function Returns: the result of the reduction See Also: reduce(BinaryOperator), reduce(Object, BinaryOperator)

I assume its purpose is to allow parallel computation, and so my guess is that it's only used if the reduction is performed in parallel. If it's performed sequentially, there's no need to use combiner. I do not know this for sure -- I'm just guessing based on the doc comment "[...] is not constrained to execute sequentially" and the many other mentions of "parallel execution" in the comments.

like image 52
Matt Fenwick Avatar answered Oct 14 '22 18:10

Matt Fenwick


I think Reduction operations paragraph from java.util.stream package summary can answer the question. Let me cite the most important part here:


In its more general form, a reduce operation on elements of type <T> yielding a result of type <U> requires three parameters:

<U> U reduce(U identity,               BiFunction<U, ? super T, U> accumulator,               BinaryOperator<U> combiner); 

Here, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result. The combiner function combines two partial results to produce a new partial result. (The combiner is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.) More formally, the identity value must be an identity for the combiner function. This means that for all u, combiner.apply(identity, u) is equal to u. Additionally, the combiner function must be associative and must be compatible with the accumulator function: for all u and t, combiner.apply(u, accumulator.apply(identity, t)) must be equals() to accumulator.apply(u, t).

The three-argument form is a generalization of the two-argument form, incorporating a mapping step into the accumulation step. We could re-cast the simple sum-of-weights example using the more general form as follows:

 int sumOfWeights = widgets.stream()                            .reduce(0,                                    (sum, b) -> sum + b.getWeight())                                    Integer::sum); 

though the explicit map-reduce form is more readable and therefore should usually be preferred. The generalized form is provided for cases where significant work can be optimized away by combining mapping and reducing into a single function.


In other words, as far as I understand, the three-argument form is useful in two cases:

  1. When parallel execution matters.
  2. When significant performance optimization can be achieved by combining mapping and accumulation steps. Otherwise more simple and readable explicit map-reduce form can be used.

Explicit form is mentioned previously in the same doc:

int sumOfWeights = widgets.parallelStream()         .filter(b -> b.getColor() == RED)         .mapToInt(b -> b.getWeight())         .sum(); 
like image 39
Andrii Polunin Avatar answered Oct 14 '22 17:10

Andrii Polunin