Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java 8 parallel stream takes more time

I am trying to learn about java 8 parallel stream. i have written below code, first using Executor and then using parallel stream. it seems parallel stream is taking twice(10 seconds) as much time as Executor approach (5 seconds). in my opinion parallel stream should be also show similar performance. any idea why parallel stream takes double time? my computer has 8 cores.

/**
 * 
 */
package com.shashank.java8.parallel_stream;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

/**
 * @author pooja
 *
 */
public class Sample {

    public static int processUrl(String url) {

        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println("Running Thread " + Thread.currentThread());
        return url.length();
    }

    /**
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        usingExecutor();
        usingParallelStream();
    }

    public static void usingParallelStream() {

        Date start = new Date();
        // TODO Auto-generated method stub
        int total = buildUrlsList().parallelStream().mapToInt(Sample::processUrl).reduce(0, Integer::sum);
        Date end = new Date();
        System.out.println(total);
        System.out.println((end.getTime() - start.getTime()) / 1000);

    }

    public static void usingExecutor() throws Exception {
        Date start = new Date();
        ExecutorService executorService = Executors.newFixedThreadPool(100);
        List<Future> futures = new ArrayList<>();

        for (String url : buildUrlsList()) {
            futures.add(executorService.submit(() -> processUrl(url)));

        }

        // iterate through the future
        int total = 0;
        for (Future<Integer> future : futures) {
            total += future.get();
        }
        System.out.println(total);
        Date end = new Date();
        System.out.println((end.getTime() - start.getTime()) / 1000);

    }

    public static List<String> buildUrlsList() {
        return Arrays.asList("url1", "url2", "url3", "url4", "url5", "url6", "url7", "url8", "url9");

    }

}
like image 757
shanks Avatar asked May 08 '17 17:05

shanks


1 Answers

The explanation is quite simple. You have 8 cores, so parallelStream() normally may parallelize the work in 8 threads. They all grab a task immediately and they all sleep for 5 seconds. Then one of them takes the next (9th) task and it sleeps for 5 more seconds. Then the processing is done. This means ~ 5 seconds (8 threads) + 5 seconds (1 thread) = 10 seconds in total. But let's see this in action. I'll modify slightly your code:

 public static int processUrl(String url) {

    try {
        Thread.sleep(5000);
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    System.out.println("T[" + Thread.currentThread().getId() + "] finished @[" + System.currentTimeMillis() / 1000 + "]");
    return url.length();
}

With the parallel stream you may get output similar to:

T[1] finished @[1494267500]
T[12] finished @[1494267500]
T[17] finished @[1494267500]
T[13] finished @[1494267500]
T[14] finished @[1494267500]
T[16] finished @[1494267500]
T[11] finished @[1494267500]
T[15] finished @[1494267500]
T[12] finished @[1494267505]
36
10

Note that the same thread T[12] completes a task twice and finishes 5 seconds after the first 'round' of 8 tasks.

With the thread executor you have created 100 threads. So 9 threads grab one task each and the execution time will be about 5 seconds because the thread pool won't be exhausted:

T[14] finished @[1494267783]
T[11] finished @[1494267783]
T[19] finished @[1494267783]
T[17] finished @[1494267783]
T[12] finished @[1494267783]
T[16] finished @[1494267783]
T[13] finished @[1494267783]
T[15] finished @[1494267783]
T[18] finished @[1494267783]
36
5

Note that there are no threads with the same ID-s here. (This is NOT a recommendation for choosing a universal number of threads for a fixed pool :-) I'm just elaborating on your actual question).

Experiment with the scheduler and assign just 8 threads:

ExecutorService executorService = Executors.newFixedThreadPool(8);

Then the execution times are likely to be roughly the same because the thread pool will be exhausted. You will notice similar performance if the URL-s are just 8, not 9.

OF COURSE there is no guarantee that this code will behave the same across different environments.

like image 152
Lachezar Balev Avatar answered Oct 25 '22 23:10

Lachezar Balev