The normal reduction is meant to combine two immutable values such as int, double, etc. and produce a new one; it's an immutable reduction. In contrast, the collect method is designed to mutate a container to accumulate the result it's supposed to produce.
A reduction is a terminal operation that aggregates a stream into a type or a primitive. The Java 8 Stream API contains a set of predefined reduction operations, such as average , sum , min , max , and count , which return one value by combining the elements of a stream.
Java Collections framework is used for storing and manipulating group of data. It is an in-memory data structure and every element in the collection should be computed before it can be added in the collections. Stream API is only used for processing group of data.
Reducing is the repeated process of combining all elements. reduce operation applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and second argument is the current stream element.
reduce
is a "fold" operation, it applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and the second argument is the current stream element.
collect
is an aggregation operation where a "collection" is created and each element is "added" to that collection. Collections in different parts of the stream are then added together.
The document you linked gives the reason for having two different approaches:
If we wanted to take a stream of strings and concatenate them into a single long string, we could achieve this with ordinary reduction:
String concatenated = strings.reduce("", String::concat)
We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.
So the point is that the parallelisation is the same in both cases but in the reduce
case we apply the function to the stream elements themselves. In the collect
case we apply the function to a mutable container.
The reason is simply that:
collect()
can only work with mutable result objects.reduce()
is designed to work with immutable result objects.reduce()
with immutable" examplepublic class Employee {
private Integer salary;
public Employee(String aSalary){
this.salary = new Integer(aSalary);
}
public Integer getSalary(){
return this.salary;
}
}
@Test
public void testReduceWithImmutable(){
List<Employee> list = new LinkedList<>();
list.add(new Employee("1"));
list.add(new Employee("2"));
list.add(new Employee("3"));
Integer sum = list
.stream()
.map(Employee::getSalary)
.reduce(0, (Integer a, Integer b) -> Integer.sum(a, b));
assertEquals(Integer.valueOf(6), sum);
}
collect()
with mutable" exampleE.g. if you would like to manually calculate a sum using collect()
it can not work with BigDecimal
but only with MutableInt
from org.apache.commons.lang.mutable
for example. See:
public class Employee {
private MutableInt salary;
public Employee(String aSalary){
this.salary = new MutableInt(aSalary);
}
public MutableInt getSalary(){
return this.salary;
}
}
@Test
public void testCollectWithMutable(){
List<Employee> list = new LinkedList<>();
list.add(new Employee("1"));
list.add(new Employee("2"));
MutableInt sum = list.stream().collect(
MutableInt::new,
(MutableInt container, Employee employee) ->
container.add(employee.getSalary().intValue())
,
MutableInt::add);
assertEquals(new MutableInt(3), sum);
}
This works because the accumulator container.add(employee.getSalary().intValue());
is not supposed to return a new object with the result but to change the state of the mutable container
of type MutableInt
.
If you would like to use BigDecimal
instead for the container
you could not use the collect()
method as container.add(employee.getSalary());
would not change the container
because BigDecimal
it is immutable.
(Apart from this BigDecimal::new
would not work as BigDecimal
has no empty constructor)
The normal reduction is meant to combine two immutable values such as int, double, etc. and produce a new one; it’s an immutable reduction. In contrast, the collect method is designed to mutate a container to accumulate the result it’s supposed to produce.
To illustrate the problem, let's suppose you want to achieve Collectors.toList()
using a simple reduction like
List<Integer> numbers = stream.reduce(
new ArrayList<Integer>(),
(List<Integer> l, Integer e) -> {
l.add(e);
return l;
},
(List<Integer> l1, List<Integer> l2) -> {
l1.addAll(l2);
return l1;
});
This is the equivalent of Collectors.toList()
. However, in this case you mutate the List<Integer>
. As we know the ArrayList
is not thread-safe, nor is safe to add/remove values from it while iterating so you will either get concurrent exception or ArrayIndexOutOfBoundsException
or any kind of exception (especially when run in parallel) when you update the list or the combiner tries to merge the lists because you are mutating the list by accumulating (adding) the integers to it. If you want to make this thread-safe you need to pass a new list each time which would impair performance.
In contrast, the Collectors.toList()
works in a similar fashion. However, it guarantees thread safety when you accumulate the values into the list. From the documentation for the collect
method:
Performs a mutable reduction operation on the elements of this stream using a Collector. If the stream is parallel, and the Collector is concurrent, and either the stream is unordered or the collector is unordered, then a concurrent reduction will be performed. When executed in parallel, multiple intermediate results may be instantiated, populated, and merged so as to maintain isolation of mutable data structures. Therefore, even when executed in parallel with non-thread-safe data structures (such as ArrayList), no additional synchronization is needed for a parallel reduction.
So to answer your question:
When would you use
collect()
vsreduce()
?
if you have immutable values such as ints
, doubles
, Strings
then normal reduction works just fine. However, if you have to reduce
your values into say a List
(mutable data structure) then you need to use mutable reduction with the collect
method.
Let the stream be a <- b <- c <- d
In reduction,
you will have ((a # b) # c) # d
where # is that interesting operation that you would like to do.
In collection,
your collector will have some kind of collecting structure K.
K consumes a. K then consumes b. K then consumes c. K then consumes d.
At the end, you ask K what the final result is.
K then gives it to you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With