Why does the following code:
StringBuilder sb22 = IntStream
.range(1, 101)
.filter(x -> x > 50)
.boxed()
.parallel()
.collect(// object that is used in accumulator to do accumulating on
StringBuilder::new,
// use object from above and call append on it with each stream element as argument
(sb, a) -> sb.append(":" + a),
// (executes only when using parallel!)
(sb1, sb2) -> {
System.out.println(Thread.currentThread().getId() + " " + "sb1=" + sb1 + " AND " + "sb2=" + sb2);
sb1.append("-"+sb2);
});
produces this result:
------------------:51:52:53-:54:55:56-:57:58:59-:60:61:62-:63:64:65-:66:67:68-:69:70:71-:72:73-:74:75-:76:77:78-:79:80:81-:82:83:84-:85:86:87-:88:89:90-:91:92:93-:94:95:96-:97:98-:99:100
shouldn't first part (------------------
) be excluded from the output?
Also, I understood that combiner in collect can potentially be called out of order so it is possible to have instead :76:77:78-:79:80:81
e.g. :63:64:65-:79:80:81
?
UPDATE (after @Holger response)
This is the tree generated using code he linked for this case:
[51..100]
_________________________________________________________________________________/\______________________________________________________________________
| |
(empty) [51..100]
___________________________________/\__________________________________ ________________________________________/\______________________________________
| | | |
(empty) (empty) [51..75] [76..100]
___________________/\______________ ___________________/\______________ ______________________/\________________ ______________________/\________________
| | | | | | | |
(empty) (empty) (empty) (empty) [51..62] [63..75] [76..87] [88..100]
_______/\______ ___________/\______ _______/\______ ___________/\______ ________/\_______ _____________/\_______ ________/\_______ _____________/\_______
| | | | | | | | | | | | | | | |
(empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) [51..56] [57..62] [63..68] [69..75] [76..81] [82..87] [88..93] [94..100]
___/\__ ___/\__ ___/\__ _______/\__ ___/\__ ___/\__ ___/\__ _______/\__ ___/\___ ___/\___ ___/\___ ________/\__ ___/\___ ___/\___ ___/\___ ________/\___
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
(empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) (empty) [51..53] [54..56] [57..59] [60..62] [63..65] [66..68] [69..71] [72..75] [76..78] [79..81] [82..84] [85..87] [88..90] [91..93] [94..96] [97..100]
___/\__ ___/\__ ___/\___ ____/\__
| | | | | | | |
(empty) (empty) (empty) (empty) [72..73] [74..75] [97..98] [99..100]
The workload splitting happens before anything has been processed, hence, the Stream implementation will split the range [1, 101]
into subranges to be processed. At this point, it doesn’t know that the filter will remove the first half entirely, it can’t know without evaluating the predicate and that should already happen in parallel, hence, after the workload splitting.
Therefore, each subrange gets processed the same way, including collecting the results into a container and combining those containers afterwards, even if they happen to be empty. The specification doesn’t say that the combining step will be skipped when no elements arrived at the collector, hence, you shouldn’t expect that. While in theory, it would be possible to track whether any elements reached the collector, this tracking would only serve a specific case and it’s not even clear whether combining a container with an empty container (like adding an empty List
or appending an empty StringBuilder
) is more expensive than this tracking.
Of course, nothing stops you from optimizing your combiner, if it retains the semantics, e.g. instead of (sb1, sb2) -> sb1.append(sb2)
, you could use (sb1, sb2) -> sb1.length()==0? sb2: sb1.append(sb2)
You can look at this Q&A, “Visualization of Java Stream parallelization” for more details.
It looks an attempted optimization resulted in some unnecessary StringBuilders being created to process x < 51
. These builders never accumulated strings due to the filter, but even though they were empty they were still concatenated to the others. Maybe with smarter optimization some of that work could have been eliminated.
Regarding your second question, if you wanted to swap order only during the concatenation you would write sb2.append(sb1)
, although that would create unreliable results since you are appending in a different order, and this inconsistent behavior would go against the contract.
You have broken associativity in sb1.append("-"+sb2)
, this is stated by the docs. So, in parallel execution what you get is really unknown/unpredictable.
A correct combiner
would be for example StringBuilder::append
or as lambda :
(left, right) -> left.append(right)
They can't be out of order, they will preserve the order (whatever that order is). For example if you would have streamed from a HashSet
(which does not have any order) you would get a different result. Potentially, using java-9 and Set.of
this result will differ from run to run.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With