Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent Execution: Future vs parallelstream

I wrote a callable that polls a remote client for information and returns that information in List form. I'm using a threadpoolexecutor, for loop, and Future to execute the task in parallel against multiple remote clients. Then I combine all of the Future lists with addAll() and work with the giant combined list.

My question is, would using parallelstream() be more efficient here than using future and a for loop? It's certainly easier to code! If I went that route would I need the threadpoolexecutor anymore?

Thank you!

        for(SiteInfo site : active_sites) {
            TAG_SCANNER scanr = new TAG_SCANNER(site, loggr);
            Future<List<TagInfo>> result = threadmaker.submit(scanr);

            //SOUND THE ALARMS
            try {
                alarm_tags.addAll(result.get());
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }

Possible solution code? Netbeans is suggesting something along these lines

active_sites.parallelstream().map((site) -> new TAG_SCANNER(site, loggr)).map((scanr) -> threadmaker.submit(scanr)).forEach((result) -> {
            //SOUND THE ALARMS
            try {
                alarm_tags.addAll(result.get());
            }
            catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        });
like image 436
TheFunk Avatar asked Dec 07 '22 23:12

TheFunk


2 Answers

There are several misconceptions here. First, using an asynchronous task does not improve your resource utilization, if you call Future.get right after submitting the task, immediately waiting for its completion before submitting the next task.

Second, the code transformation made by Netbeans produced a mostly equivalent code, still submitting tasks to an Executor so it’s not a matter of “Future vs parallelstream” as you are only performing the submission (and waiting) with the parallel stream and still using the executor. Due to your first error, doing it in parallel might improve the throughput, but besides that it is never a good idea to combine two mistakes to let them cancel themselves out, it’s still a poor solution:

The standard implementation of the Stream API is optimized for CPU-bound tasks, creating a number of threads matching the number of CPU cores and not spawning new threads when these threads get blocked in a wait operation. So using parallel streams for performing I/O operations, or generally operations which may wait, is not a good choice. And you have no control over the threads used by the implementation.

The better choice is staying with the ExecutorService which you can configure according to your expected I/O bandwidth to your remote clients. But you should fix the error of waiting immediately after submission, submitting all tasks first and waiting for the completion of all tasks afterwards. Note that you can use the stream API for that, not for better parallelism, but potentially improving the readability:

// first, submit all tasks, assuming "threadmaker" is an ExecutorService
List<Future<List<TagInfo>>> futures=threadmaker.invokeAll(
    active_sites.stream()
        .map(site -> new TAG_SCANNER(site, loggr))
        .collect(Collectors.toList())
);
// now fetch all results
for(Future<List<TagInfo>> result: futures) {
    //SOUND THE ALARMS
    try {
        alarm_tags.addAll(result.get());
    } catch (InterruptedException | ExecutionException e) {
        // not a recommended way of handling
        // but I keep your code here for simplicity
        e.printStackTrace();
    }
}

Note that the stream API use here is sequential and only for converting your list of SiteInfo to a list of Callable<List<TagInfo>>, but you could do the same using a loop.

like image 60
Holger Avatar answered Dec 27 '22 17:12

Holger


In general parallelstream has been written by very smart programmers to do parallel processing very effectively.

With that, as with all the other java threading such as the concurrency package then unless you are an expert in the subject then if you write it yourself you are likely to:

  • Run slower
  • Introduce bugs
  • Have more complex/harder to follow/etc code

In other words: Yes, use parallelstream.

like image 44
Tim B Avatar answered Dec 27 '22 18:12

Tim B