Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java8 stream.reduce() with 3 parameters - getting transparency

I wrote this code to reduce a list of words to a long count of how many words start with an 'A'. I'm just writing it to learn Java 8, so I'd like to understand it a little better [Disclaimer: I realize this is probably not the best way to write this code; it's just for practice!].

Long countOfAWords = results.stream().reduce(
    0L,
    (a, b) -> b.charAt(0) == 'A' ? a + 1 : a,
    Long::sum);

The middle parameter/lambda (called the accumulator) would seem to be capable of reducing the full list without the final 'Combiner' parameter. In fact, the Javadoc actually says:

The {@code accumulator} function acts as a fused mapper and accumulator, * which can sometimes be more efficient than separate mapping and reduction, * such as when knowing the previously reduced value allows you to avoid * some computation.

[Edit From Author] - The following statement is wrong, so don't let it confuse you; I'm just keeping it here so I don't ruin the original context of the answers.

Anyway, I can infer that the accumulator must just be outputting 1's and 0's which the combiner combines. I didn't find this particularly obvious from the documentation though.

My Question

Is there a way to see what the output would be before the combiner executes so I can see the list of 1's and 0's that the combiner combines? This would be helpful in debugging more complex situations which I'm sure I'll come across eventually.

like image 932
John Humphreys Avatar asked May 03 '15 15:05

John Humphreys


People also ask

What does reduce () method does in stream?

Reducing is the repeated process of combining all elements. reduce operation applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and second argument is the current stream element.

What is reduce in Java 8 streams?

In Java, reducing is a terminal operation that aggregates a stream into a type or a primitive type. Java 8 provides Stream API contains set of predefined reduction operations such as average(), sum(), min(), max(), and count(). These operations return a value by combining the elements of a stream.

What happens if a reduction operation has no identity element?

Identity is the default result of reduction if there are no elements in the stream. That's the reason, this version of reduce method doesn't return Optional because it would at least return the identity element. Ignoring this rule will result in unexpected outcomes.

Does Java 8 stream improve performance?

Java 8 introduced streams. Not to be confused with input/output streams, these Java 8+ streams can also process data that goes through them. It was hailed as a great new feature that allowed coders to write algorithms in a more readable (and therefore more maintainable) way.


2 Answers

The combiner does not reduce a list of 0's and 1's. When the stream is not run in parallel it's not used in this case so that the following loop is equivalent:

U result = identity;
for (T element : this stream)
    result = accumulator.apply(result, element)
return result;

When you run the stream in parallel, the task is spanned into multiple threads. So for example the data in the pipeline is partitioned into chunks that evaluate and produce a result independently. Then the combiner is used to merge this results.

So you won't see a list that is reduced, but rather 2 values either the identity value or with another value computed by a task that are summed. For example if you add a print statement in the combiner

(i1, i2) -> {System.out.println("Merging: "+i1+"-"+i2); return i1+i2;}); 

you could see something like this:

Merging: 0-0
Merging: 0-0
Merging: 1-0
Merging: 1-0
Merging: 1-1

This would be helpful in debugging more complex situations which I'm sure I'll come across eventaully.

More generally if you want to see the data on the pipeline on the go you can use peek (or the debugger could also help). So applied to your example:

long countOfAWords = result.stream().map(s -> s.charAt(0) == 'A' ? 1 : 0).peek(System.out::print).mapToLong(l -> l).sum();

which can output:

100100

[Disclaimer: I realize this is probably not the best way to write this code; it's just for practice!].

The idiomatic way to achieve your task would be to filter the stream and then simply use count:

long countOfAWords = result.stream().filter(s -> s.charAt(0) == 'A').count();

Hope it helps! :)

like image 83
Alexis C. Avatar answered Sep 24 '22 17:09

Alexis C.


One way to see what's going on is to replace the method reference Long::sum by a lambda that includes a println.

List<String> results = Arrays.asList("A", "B", "A", "A", "C", "A", "A");
Long countOfAWords = results.stream().reduce(
        0L,
        (a, b) -> b.charAt(0) == 'A' ? a + 1 : a,
        (a, b) -> {
            System.out.println(a + " " + b);
            return Long.sum(a, b);
        });

In this case, we can see that the combiner is not actually used. This is because the stream is not parallel. All we are really doing is using the accumulator to successively combine each String with the current Long result; no two Long values are ever combined.

If you replace stream by parallelStream you can see that the combiner is used and look at the values it combines.

like image 41
Paul Boddington Avatar answered Sep 24 '22 17:09

Paul Boddington