How do Java 8 parallel streams behave on a thrown exception in the consuming clause, for example in forEach
handling? For example, the following code:
final AtomicBoolean throwException = new AtomicBoolean(true); IntStream.range(0, 1000) .parallel() .forEach(i -> { // Throw only on one of the threads. if (throwException.compareAndSet(true, false)) { throw new RuntimeException("One of the tasks threw an exception. Index: " + i); });
Does it stop the handled elements immediately? Does it wait for the already started elements to finish? Does it wait for all the stream to finish? Does it start handling stream elements after the exception is thrown?
When does it return? Immediately after the exception? After all/part of the elements were handled by the consumer?
Do elements continue being handled after the parallel stream threw the exception? (Found a case where this happened).
Is there a general rule here?
EDIT (15-11-2016)
Trying to determine if the parallel stream returns early, I found that it's not determinate:
@Test public void testParallelStreamWithException() { AtomicInteger overallCount = new AtomicInteger(0); AtomicInteger afterExceptionCount = new AtomicInteger(0); AtomicBoolean throwException = new AtomicBoolean(true); try { IntStream.range(0, 1000) .parallel() .forEach(i -> { overallCount.incrementAndGet(); afterExceptionCount.incrementAndGet(); try { System.out.println(i + " Sleeping..."); Thread.sleep(1000); System.out.println(i + " After Sleeping."); } catch (InterruptedException e) { e.printStackTrace(); } // Throw only on one of the threads and not on main thread. if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) { System.out.println("Throwing exception - " + i); throw new RuntimeException("One of the tasks threw an exception. Index: " + i); } }); Assert.fail("Should not get here."); } catch (Exception e) { System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0."); afterExceptionCount.set(0); } System.out.println("Overall count: " + overallCount.get()); System.out.println("After exception count: " + afterExceptionCount.get()); }
Late return when throwing not from the main thread. This caused a lot of new elements to be handled way after the exception was thrown. On my machine, about 200 elements were handled after the exception was thrown. BUT, not all 1000 elements were handled. So what's the rule here? Why more elements were handled even though the exception was thrown?
Early return when removing the not (!
) sign, causing the exception to be thrown in the main thread. Only the already started elements finished processing and no new ones were handled. Returning early was the case here. Not consistent with the previous behavior.
What am I missing here?
Instead, you have three primary approaches: Add a try/catch block to the lambda expression. Create an extracted method, as in the unchecked example. Write a wrapper method that catches checked exceptions and rethrows them as unchecked.
Parallel streams create ForkJoinPool instance via static ForkJoinPool. commonPool() method. Parallel Stream takes benefits of all available CPU cores and processes the tasks in parallel. If the number of tasks exceeds the number of cores, then remaining tasks wait for currently running task to complete.
You can execute streams in serial or in parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.
When a stream executes in parallel, the Java runtime partitions the stream into multiple sub streams. Aggregate operations iterate over and process these sub streams in parallel and then combine the results.
Starting with Java 8, the aspect of streams has made parallelism idiomatic also. Streams' parallel () calls the ForkJoinPool. And, they do it in a functional manner too. With functional Java, its internals execute the how of parallelism. While they leave client code to declare what it wishes to parallelize.
Though, keep in mind that parallelStream () is just a shortcut for: The BaseStream interface defines a parallel () method as one which: "Returns an equivalent stream that is parallel. May return itself, either because the stream was already parallel, or because the underlying stream state was modified to be parallel."
Streams provide us with the flexibility to iterate over the list in a parallel pattern and can give the aggregate in quick fashion. Stream implementation in Java is by default sequential unless until it is explicitly mentioned for parallel. When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams.
When an exception is thrown in one of the stages, it does not wait for other operations to finish, the exception is re-thrown to the caller. That is how ForkJoinPool handles that.
In contrast findFirst for example when run in parallel, will present the result to the caller only after ALL operations have finished processing (even if the result is known before the need to finish of all operations).
Put in other words : it will return early, but will leave all the running tasks to finish.
EDIT to answer the last comment
This is very much explained by Holger's answer (link in comments), but here are some details.
1) When killing all BUT the main thread, you are also killing all the tasks that were supposed to be handled by these threads. So that number should actually be more around 250 as there are 1000 tasks and 4 Threads, I assume this returns 3?:
int result = ForkJoinPool.getCommonPoolParallelism();
Theoretically there are 1000 tasks, there are 4 threads, each supposed to handle 250 tasks, then you kill 3 of them meaning 750 tasks are lost. There are 250 tasks left to execute, and ForkJoinPool will span 3 new threads to execute these 250 left tasks.
A few things you can try, change your stream like this (making the stream not sized):
IntStream.generate(random::nextInt).limit(1000).parallel().forEach
This time, there would be many more operations ending, because the initial split index is unknown and chosen by some other strategy. What you could also try is change this :
if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) {
to this:
if (!Thread.currentThread().getName().equals("main")) {
This time you would always kill all threads besides main, until a certain point, where no new threads will be created by ForkJoinPool as the task is too small to split, thus no need for other threads. In this case even less tasks would finish.
2) Your second example, when you actually kill the main thread, as the way code is, you will not see the actual running of other threads. Change it :
} catch (Exception e) { System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0."); afterExceptionCount.set(0); } // give some time for other threads to finish their work. You could play commenting and de-commenting this line to see a big difference in results. TimeUnit.SECONDS.sleep(60); System.out.println("Overall count: " + overallCount.get()); System.out.println("After exception count: " + afterExceptionCount.get());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With