Method 1 The usual, very fast, and works great. <pre class="prettyprint lang-java prettyprint-override"><code>public static int loops = 500; private static ExecutorService customPool = Executors.newFixedThreadPool(loops); . . Instant start = Instant.now(); LongSummaryStatistics stats = LongStream.range(0, loops).boxed() .map(number -> CompletableFuture.supplyAsync(() -> DummyProcess.slowNetworkCall(number), customPool)) .collect(Collectors.toList()).stream() // collect first, else will be sequential .map(CompletableFuture::join) .mapToLong(Long::longValue) .summaryStatistics(); log.info("cf completed in :: {}, summaryStats :: {} ", Duration.between(start, Instant.now()).toMillis(), stats); // ... cf completed in :: 1054, summaryStats :: LongSummaryStatistics{count=500, sum=504008, min=1000, average=1008.016000, max=1017} </code></pre> I understand that if I don't collect the stream first, then by nature of laziness, the stream will spring up CompletableFutures one by one, and behave synchronously. So, as an experiment: Method 2 Remove the intermediate collect step, but make the stream parallel also! : <pre class="prettyprint lang-java prettyprint-override"><code>Instant start = Instant.now(); LongSummaryStatistics stats = LongStream.range(0, loops).boxed() .parallel() .map(number -> CompletableFuture.supplyAsync(() -> DummyProcess.slowNetworkCall(number), customPool)) .map(CompletableFuture::join) // direct join .mapToLong(Long::longValue).summaryStatistics(); log.info("cfps_directJoin completed in :: {}, summaryStats :: {} ", Duration.between(start, Instant.now()).toMillis(), stats); // ... cfps_directJoin completed in :: 8098, summaryStats :: LongSummaryStatistics{count=500, sum=505002, min=1000, average=1010.004000, max=1015} </code></pre> Summary: <ul> <li>Method 1 :: 1 second </li> <li>Method 2 :: 8 seconds </li> </ul> A pattern I observed: <ol> <li>the parallelstream approach "batches" 60 calls at onces, so with 500 loops, 500/60 ~ 8 batches, each taking 1 second, thus total 8</li> <li>SO, when I reduce the loop count to 300, there are 300/60 = 5 batches, and it takes 5 seconds to complete actually.</li> </ol> <h3>So, the question is:</h3> Why is there this batching of calls in the parallel + direct collection approach? <hr> For completion, here's my dummy network call method: <pre class="prettyprint lang-java prettyprint-override"><code> public static Long slowNetworkCall(Long i) { Instant start = Instant.now(); log.info(" {} going to sleep..", i); try { TimeUnit.MILLISECONDS.sleep(1000); // 1 second } catch (InterruptedException e) { e.printStackTrace(); } log.info(" {} woke up..", i); return Duration.between(start, Instant.now()).toMillis(); } </code></pre>

This is an artifact of how <code>ForJoinPool</code> handles things when you block its inner threads, and how many new ones it spawns. Though, I could probably find the exact lines where this happens, I am not sure it is worth it. For two reasons: <ul> <li> that logic can change </li> <li> the code inside <code>ForkJoinPool</code> is by far not trivial </li> </ul> It seems that for both of us, <code>ForkJoinPool.commonPool().getParallelism()</code> will return <code>11</code>, so I get the same results as you do. If you log <code>ForkJoinPool.commonPool().getPoolSize()</code> to find out how many active threads is your code using, you will see that after a certain period, it just stabilizes itself at <code>64</code>. So the max tasks that can be processed at the same time is <code>64</code>, which is on par with the result that you see (those <code>8 seconds</code>). If I run your code with <code>-Djava.util.concurrent.ForkJoinPool.common.parallelism=50</code>, it is now executed in <code>2 seconds</code>, and the pool size is increased to <code>256</code>. That means, there is an internal logic that adjusts these kind of things.

CompletableFuture on ParallelStream gets batched and runs slower than sequential stream?

Method 1

The usual, very fast, and works great.

public static int loops = 500;
private static ExecutorService customPool = Executors.newFixedThreadPool(loops);
.
.
Instant start = Instant.now();
LongSummaryStatistics stats = LongStream.range(0, loops).boxed()
        .map(number -> CompletableFuture.supplyAsync(() -> DummyProcess.slowNetworkCall(number), customPool))
        .collect(Collectors.toList()).stream() // collect first, else will be sequential
        .map(CompletableFuture::join)
        .mapToLong(Long::longValue)
        .summaryStatistics();

log.info("cf completed in :: {}, summaryStats :: {} ", Duration.between(start, Instant.now()).toMillis(), stats);
// ... cf completed in :: 1054, summaryStats :: LongSummaryStatistics{count=500, sum=504008, min=1000, average=1008.016000, max=1017}

I understand that if I don't collect the stream first, then by nature of laziness, the stream will spring up CompletableFutures one by one, and behave synchronously. So, as an experiment:

Method 2

Remove the intermediate collect step, but make the stream parallel also! :

Instant start = Instant.now();
LongSummaryStatistics stats = LongStream.range(0, loops).boxed()
        .parallel()
        .map(number -> CompletableFuture.supplyAsync(() -> DummyProcess.slowNetworkCall(number), customPool))
        .map(CompletableFuture::join) // direct join
        .mapToLong(Long::longValue).summaryStatistics();

log.info("cfps_directJoin completed in :: {}, summaryStats :: {} ", Duration.between(start, Instant.now()).toMillis(), stats);
// ... cfps_directJoin completed in :: 8098, summaryStats :: LongSummaryStatistics{count=500, sum=505002, min=1000, average=1010.004000, max=1015}

Summary:

Method 1 :: 1 second
Method 2 :: 8 seconds

A pattern I observed:

the parallelstream approach "batches" 60 calls at onces, so with 500 loops, 500/60 ~ 8 batches, each taking 1 second, thus total 8
SO, when I reduce the loop count to 300, there are 300/60 = 5 batches, and it takes 5 seconds to complete actually.

So, the question is:

Why is there this batching of calls in the parallel + direct collection approach?

For completion, here's my dummy network call method:

    public static Long slowNetworkCall(Long i) {
        Instant start = Instant.now();
        log.info(" {} going to sleep..", i);
        try {
            TimeUnit.MILLISECONDS.sleep(1000); // 1 second
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        log.info(" {} woke up..", i);
        return Duration.between(start, Instant.now()).toMillis();
    }

What is the difference between stream () and parallelStream ()?

stream() works in sequence on a single thread with the println() operation. list. parallelStream(), on the other hand, is processed in parallel, taking full advantage of the underlying multicore environment. The interesting aspect is in the output of the preceding program.

Is Completable future blocking?

The CompletableFuture. get() method is blocking. It waits until the Future is completed and returns the result after its completion.

How many threads does parallelStream use?

In case of Parallel stream,4 threads are spawned simultaneously and it internally using Fork and Join pool to create and manage threads.

What is parallel processing in stream?

Usually, any Java code that has only one processing stream, where it is sequentially executed. However, by using parallel streams, one can separate the Java code into more than one stream, which is executed in parallel on their separate cores, and the end result is the combination of the individual results.

This is an artifact of how ForJoinPool handles things when you block its inner threads, and how many new ones it spawns. Though, I could probably find the exact lines where this happens, I am not sure it is worth it. For two reasons:

that logic can change
the code inside ForkJoinPool is by far not trivial

It seems that for both of us, ForkJoinPool.commonPool().getParallelism() will return 11, so I get the same results as you do. If you log ForkJoinPool.commonPool().getPoolSize() to find out how many active threads is your code using, you will see that after a certain period, it just stabilizes itself at 64. So the max tasks that can be processed at the same time is 64, which is on par with the result that you see (those 8 seconds).

If I run your code with -Djava.util.concurrent.ForkJoinPool.common.parallelism=50, it is now executed in 2 seconds, and the pool size is increased to 256. That means, there is an internal logic that adjusts these kind of things.

CompletableFuture on ParallelStream gets batched and runs slower than sequential stream?

Tags:

java

parallel-processing

java-stream

completable-future

So, the question is:

Somjit

People also ask

1 Answers

Eugene

Recent Activity

Donate For Us

CompletableFuture on ParallelStream gets batched and runs slower than sequential stream?

Tags:

java

parallel-processing

java-stream

completable-future

So, the question is:

Somjit

People also ask

1 Answers

Eugene

Related questions

Recent Activity

Donate For Us