Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling sequential on parallel stream makes all previous operations sequential

I've got a significant set of data, and want to call slow, but clean method and than call fast method with side effects on result of the first one. I'm not interested in intermediate results, so i would like not to collect them.

Obvious solution is to create parallel stream, make slow call , make stream sequential again, and make fast call. The problem is, ALL code executing in single thread, there is no actual parallelism.

Example code:

@Test
public void testParallelStream() throws ExecutionException, InterruptedException
{
    ForkJoinPool forkJoinPool = new ForkJoinPool(Runtime.getRuntime().availableProcessors() * 2);
    Set<String> threads = forkJoinPool.submit(()-> new Random().ints(100).boxed()
            .parallel()
            .map(this::slowOperation)
            .sequential()
            .map(Function.identity())//some fast operation, but must be in single thread
            .collect(Collectors.toSet())
    ).get();
    System.out.println(threads);
    Assert.assertEquals(Runtime.getRuntime().availableProcessors() * 2, threads.size());
}

private String slowOperation(int value)
{
    try
    {
        Thread.sleep(100);
    }
    catch (InterruptedException e)
    {
        e.printStackTrace();
    }
    return Thread.currentThread().getName();
}

If I remove sequential, code executing as expected, but, obviously, non-parallel operation would be call in multiple threads.

Could you recommend some references about such behavior, or maybe some way to avoid temporary collections?

like image 464
the20login Avatar asked Mar 02 '16 09:03

the20login


People also ask

What does parallel and sequential stream do to increase performance?

A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed. The code used in this POC can be found on GitHub.

What is true about parallel streams?

Parallel streams enable us to execute code in parallel on separate cores. The final result is the combination of each individual outcome.

Which of the following are the features of parallel stream?

Java Parallel Streams is a feature of Java 8 and higher, meant for utilizing multiple cores of the processor. Normally any java code has one stream of processing, where it is executed sequentially.


2 Answers

Switching the stream from parallel() to sequential() worked in the initial Stream API design, but caused many problems and finally the implementation was changed, so it just turns the parallel flag on and off for the whole pipeline. The current documentation is indeed vague, but it was improved in Java-9:

The stream pipeline is executed sequentially or in parallel depending on the mode of the stream on which the terminal operation is invoked. The sequential or parallel mode of a stream can be determined with the BaseStream.isParallel() method, and the stream's mode can be modified with the BaseStream.sequential() and BaseStream.parallel() operations. The most recent sequential or parallel mode setting applies to the execution of the entire stream pipeline.

As for your problem, you can collect everything into intermediate List and start new sequential pipeline:

new Random().ints(100).boxed()
        .parallel()
        .map(this::slowOperation)
        .collect(Collectors.toList())
        // Start new stream here
        .stream()
        .map(Function.identity())//some fast operation, but must be in single thread
        .collect(Collectors.toSet());
like image 169
Tagir Valeev Avatar answered Oct 21 '22 12:10

Tagir Valeev


In the current implementation a Stream is either all parallel or all sequential. While the Javadoc isn't explicit about this and it could change in the future it does say this is possible.

S parallel()

Returns an equivalent stream that is parallel. May return itself, either because the stream was already parallel, or because the underlying stream state was modified to be parallel.

If you need the function to be single threaded, I suggest you use a Lock or synchronized block/method.

like image 35
Peter Lawrey Avatar answered Oct 21 '22 13:10

Peter Lawrey