If I have a parallel stream in java 8, and I terminate with an anyMatch, and my collection has an element that matches the predicate, I'm trying to figure out what happens when one thread processes this element.
I know that anyMatch is short circuiting, so that I wouldn't expect further elements to be processed once the matching element is processed. My confusion is about what happens to the other threads, that are presumably in the middle of processing elements. I can think of 3 plausible scenarios: a) Do they get interrupted? b) Do they keep processing the element that they are working on, and then, once all the threads are doing nothing, I get my result? c) Do I get my result, but the threads that were processing other elements continue processing those elements (but don't take on other elements once they are done)?
I have a long running predicate, where it is very useful to terminate quickly as soon as I know that one element matches. I worry a bit since I can't find this information in the documentation that it might be an implementation dependent thing, which would also be good to know.
Thanks
1. Parallel Streams can actually slow you down. Java 8 brings the promise of parallelism as one of the most anticipated new features.
Parallel stream is configured to use as many threads as the number of cores in the computer or VM on which the program is running. To illustrate this, consider the above same program run with 15 numbers in the list, which is 15 tasks run in parallel.
The idea is to create a custom fork-join pool with a desirable number of threads and execute the parallel stream within it. This allows developers to control the threads that parallel stream uses. Additionally, it separates the parallel stream thread pool from the application pool which is considered a good practice.
Parallel streams provide the capability of parallel processing over collections that are not thread-safe. It is although required that one does not modify the collection during the parallel processing.
After some digging through the Java source code I think I found the answer.
The other threads periodically check to see if another thread has found the answer and if so, then they stop working and cancel any not yet running nodes.
java.util.Stream.FindOps$FindTask
has this method:
private void foundResult(O answer) {
if (isLeftmostNode())
shortCircuit(answer);
else
cancelLaterNodes();
}
Its parent class, AbstractShortcircuitTask
implements shortCircuit
like this:
/**
* Declares that a globally valid result has been found. If another task has
* not already found the answer, the result is installed in
* {@code sharedResult}. The {@code compute()} method will check
* {@code sharedResult} before proceeding with computation, so this causes
* the computation to terminate early.
*
* @param result the result found
*/
protected void shortCircuit(R result) {
if (result != null)
sharedResult.compareAndSet(null, result);
}
And the actual compute()
method that does the work has this important line:
AtomicReference<R> sr = sharedResult;
R result;
while ((result = sr.get()) == null) {
...//does the actual fork stuff here
}
where sharedResult
is updated by the shortCircuit()
method so the compute will see it the next time it checks the while loop condition.
EDIT So in summary:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With