Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java stream - collect combiner

Why does the following code:

StringBuilder sb22 = IntStream
     .range(1, 101)
     .filter(x -> x > 50)
     .boxed()
     .parallel()
     .collect(// object that is used in accumulator to do accumulating on
              StringBuilder::new, 
              // use object from above and call append on it with each stream element as argument
              (sb, a) -> sb.append(":" + a),  
              // (executes only when using parallel!)
              (sb1, sb2) -> { 
                     System.out.println(Thread.currentThread().getId() + "  " + "sb1=" + sb1 + " AND " + "sb2=" + sb2); 
                     sb1.append("-"+sb2);
              });

produces this result:

------------------:51:52:53-:54:55:56-:57:58:59-:60:61:62-:63:64:65-:66:67:68-:69:70:71-:72:73-:74:75-:76:77:78-:79:80:81-:82:83:84-:85:86:87-:88:89:90-:91:92:93-:94:95:96-:97:98-:99:100

shouldn't first part (------------------) be excluded from the output?

Also, I understood that combiner in collect can potentially be called out of order so it is possible to have instead :76:77:78-:79:80:81 e.g. :63:64:65-:79:80:81?

UPDATE (after @Holger response)

This is the tree generated using code he linked for this case:

                                                                                                                                                    [51..100]                                                                                                                                                     
                                                                        _________________________________________________________________________________/\______________________________________________________________________                                                                                 
                                                                       |                                                                                                                                                         |                                                                                
                                                                    (empty)                                                                                                                                                 [51..100]                                                                             
                                    ___________________________________/\__________________________________                                                                              ________________________________________/\______________________________________                                         
                                   |                                                                       |                                                                            |                                                                                |                                        
                                (empty)                                                                 (empty)                                                                     [51..75]                                                                         [76..100]                                    
                ___________________/\______________                                     ___________________/\______________                                       ______________________/\________________                                         ______________________/\________________                       
               |                                   |                                   |                                   |                                     |                                        |                                       |                                        |                      
            (empty)                             (empty)                             (empty)                             (empty)                              [51..62]                                 [63..75]                                [76..87]                                 [88..100]                  
        _______/\______                 ___________/\______                     _______/\______                 ___________/\______                      ________/\_______                   _____________/\_______                       ________/\_______                   _____________/\_______              
       |               |               |                   |                   |               |               |                   |                    |                 |                 |                      |                     |                 |                 |                      |             
    (empty)         (empty)         (empty)             (empty)             (empty)         (empty)         (empty)             (empty)             [51..56]          [57..62]          [63..68]               [69..75]              [76..81]          [82..87]          [88..93]               [94..100]         
    ___/\__         ___/\__         ___/\__         _______/\__             ___/\__         ___/\__         ___/\__         _______/\__              ___/\___          ___/\___          ___/\___          ________/\__               ___/\___          ___/\___          ___/\___          ________/\___         
   |       |       |       |       |       |       |           |           |       |       |       |       |       |       |           |            |        |        |        |        |        |        |            |             |        |        |        |        |        |        |             |        
(empty) (empty) (empty) (empty) (empty) (empty) (empty)     (empty)     (empty) (empty) (empty) (empty) (empty) (empty) (empty)     (empty)     [51..53] [54..56] [57..59] [60..62] [63..65] [66..68] [69..71]     [72..75]      [76..78] [79..81] [82..84] [85..87] [88..90] [91..93] [94..96]     [97..100]     
                                                            ___/\__                                                                 ___/\__                                                                         ___/\___                                                                         ____/\__     
                                                           |       |                                                               |       |                                                                       |        |                                                                       |        |    
                                                        (empty) (empty)                                                         (empty) (empty)                                                                [72..73] [74..75]                                                                [97..98] [99..100]
like image 988
Bojan Vukasovic Avatar asked Mar 05 '18 13:03

Bojan Vukasovic


3 Answers

The workload splitting happens before anything has been processed, hence, the Stream implementation will split the range [1, 101] into subranges to be processed. At this point, it doesn’t know that the filter will remove the first half entirely, it can’t know without evaluating the predicate and that should already happen in parallel, hence, after the workload splitting.

Therefore, each subrange gets processed the same way, including collecting the results into a container and combining those containers afterwards, even if they happen to be empty. The specification doesn’t say that the combining step will be skipped when no elements arrived at the collector, hence, you shouldn’t expect that. While in theory, it would be possible to track whether any elements reached the collector, this tracking would only serve a specific case and it’s not even clear whether combining a container with an empty container (like adding an empty List or appending an empty StringBuilder) is more expensive than this tracking.

Of course, nothing stops you from optimizing your combiner, if it retains the semantics, e.g. instead of (sb1, sb2) -> sb1.append(sb2), you could use (sb1, sb2) -> sb1.length()==0? sb2: sb1.append(sb2)

You can look at this Q&A, “Visualization of Java Stream parallelization” for more details.

like image 148
Holger Avatar answered Sep 28 '22 01:09

Holger


It looks an attempted optimization resulted in some unnecessary StringBuilders being created to process x < 51. These builders never accumulated strings due to the filter, but even though they were empty they were still concatenated to the others. Maybe with smarter optimization some of that work could have been eliminated.

Regarding your second question, if you wanted to swap order only during the concatenation you would write sb2.append(sb1), although that would create unreliable results since you are appending in a different order, and this inconsistent behavior would go against the contract.

like image 34
Patrick Parker Avatar answered Sep 28 '22 02:09

Patrick Parker


You have broken associativity in sb1.append("-"+sb2), this is stated by the docs. So, in parallel execution what you get is really unknown/unpredictable.

A correct combiner would be for example StringBuilder::append or as lambda :

(left, right) -> left.append(right)

They can't be out of order, they will preserve the order (whatever that order is). For example if you would have streamed from a HashSet (which does not have any order) you would get a different result. Potentially, using java-9 and Set.of this result will differ from run to run.

like image 28
Eugene Avatar answered Sep 28 '22 02:09

Eugene