I understand that there is overhead in setting up the processing of a parallel Stream
, and that processing in a single thread is faster if there are few items or the processing of each item is fast.
But, is there a similar threshold for trySplit()
, a point where decomposing a problem into smaller chunks is counterproductive? I'm thinking by analogy to a merge sort switching to insertion sort for the smallest chunks.
If so, does the threshold depend on the relative cost of trySplit()
and consuming an item in the course of tryAdvance()
? Consider a split operation that's a lot more complicated than advancing an array index—splitting a lexically-ordered multiset permutation, for example. Is there a convention for letting clients specify the lower limit for a split when creating a parallel stream, depending on the complexity of their consumer? A heuristic the Spliterator
can use to estimate the lower limit itself?
Or, alternatively, is it always safe to let the lower limit of a Spliterator
be 1, and let the work-stealing algorithm take care of choosing whether to continue splitting or not?
In general you have no idea how much work is done in the consumer passed to tryAdvance
or forEachRemaining
. Neither stream pipeline nor FJP knows this as it depends on user supplied code. It can be either much faster or much slower than the splitting procedure. For example, you may have two-elements input but the processing of each element takes one hour, so splitting this input is very reasonable.
I usually split the input as much as I can. There are three tricks which can be used to improve the splitting:
If it's hard to split evenly, but you can track (or at least roughly estimate) the size of each sub-part, feel free to split unevenly. The stream implementation will do more further splitting for the bigger part. Don't forget about SIZED
and SUBSIZED
characteristics.
Move the hard part of splitting to the next tryAdvance
/forEachRemaining
call. For example, suppose that you have a known number of permutations and in trySplit
you are going to jump to other permutation. Something like this:
public class MySpliterator implements Spliterator<String> {
private long position;
private String currentPermutation;
private final long limit;
MySpliterator(long position, long limit, String currentPermutation) {
this.position = position;
this.limit = limit;
this.currentPermutation = currentPermutation;
}
@Override
public Spliterator<String> trySplit() {
if(limit - position <= 1)
return null;
long newPosition = (position+limit)>>>1;
Spliterator<String> prefix =
new MySpliterator(position, newPosition, currentPermutation);
this.position = newPosition;
this.currentPermutation = calculatePermutation(newPosition); // hard part
return prefix;
}
...
}
Move the hard part to the next tryAdvance
call like this:
@Override
public Spliterator<String> trySplit() {
if(limit - position <= 1)
return null;
long newPosition = (position+limit)>>>1;
Spliterator<String> prefix =
new MySpliterator(position, newPosition, currentPermutation);
this.position = newPosition;
this.currentPermutation = null;
return prefix;
}
@Override
public boolean tryAdvance(Consumer<? super String> action) {
if(currentPermutation == null)
currentPermutation = calculatePermutation(position); // hard part
...
}
This way the hard part will also be executed in parallel with prefix processing.
If you have not so many elements left in current spliterator (for example, less than 10) and the split was requested, it's probably good just to advance to the half of your elements collecting them into array, then create an array-based spliterator for this prefix (similarly to how it's done in AbstractSpliterator.trySplit()
). Here you control all the code, so you can measure in advance how normal trySplit
is slower than tryAdvance
and estimate the threshold when you should switch to array-based split.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With