I was looking at Collectors.toSet
implementation under jdk-8 and pretty much saw the obvious thing:
public static <T> Collector<T, ?, Set<T>> toSet() {
return new CollectorImpl<>(
(Supplier<Set<T>>) HashSet::new,
Set::add,
(left, right) -> { left.addAll(right); return left; }, // combiner
CH_UNORDERED_ID);
Look at the combiner
for a moment; this has been discussed before here, but the idea is that a combiner folds from the second argument into the first
. And that obviously happens here.
But then I looked into jdk-9
implementation and saw this:
public static <T> Collector<T, ?, Set<T>> toSet() {
return new CollectorImpl<>(
(Supplier<Set<T>>) HashSet::new,
Set::add,
(left, right) -> {
if (left.size() < right.size()) {
right.addAll(left); return right;
} else {
left.addAll(right); return left;
}
},
CH_UNORDERED_ID);
Now why this happens is a bit obvious - it takes less time to add less elements to a bigger Set, then the other way around
. But is that really cheaper than plain addAll
, consider the extra overhead for the branch too?
Also this breaks my law about always folding left...
Can someone shed some light here?
The combiner function of a Collector
will receive the left
and right
appropriately, if there is an encounter order to maintain, however, it is up to the Collector
, how it will actually combine these two arguments.
The documentation states:
A function that accepts two partial results and merges them. The combiner function may fold state from one argument into the other and return that, or may return a new result container.
For collecting into a List
, it would be disastrous if we just swap left.addAll(right)
to right.addAll(left)
, but for an unordered Set
, it doesn’t matter. The toSet()
collector even reports the UNORDERED
characteristic to hint to the Stream
(or any client code) that it won’t even matter which argument is provided as left
or right
, so a parallel stream could combine arbitrary partial results, whatever has finished first, in other words, it may behave like an unordered stream, even if the source had an encounter order (Java 8’s implementation doesn’t use that opportunity).
Regarding whether it is worth it… We are comparing a single additional branch with potentially thousands of add
operations we can save, each of them bearing multiple conditional branches internally…
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With