Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How beneficial is Parallel Seq for executing sequence of statements?

I have a small program using List.par

val x = List(1,2,3,4,5).par.map(y => {
    Thread.sleep(2000)
    println(y)
    y + 1
})

println(x)

Output:

3
1
4
5
2
ParVector(2, 3, 4, 5, 6)

The numbers are getting printed in parallel however the return value is always keeping its order.

My aim is to execute a sequence of insert statements to SQL database in parallel.

Currently I am using for comprehension. I want to use ParSeq as number of statements are increasing.

But I am afraid whether it results in performance degradation. (If there is extra code in map implementation for preserving its order, this is a performance overhead).

Kindly suggest me how to do it.

like image 831
Shantiswarup Tunga Avatar asked May 24 '19 13:05

Shantiswarup Tunga


1 Answers

Documentation ("Semantics" section) explains that there are only two possible scenarios that might lead to out-of-order behaviour:

  1. Side-effecting operations can lead to non-determinism
  2. Non-associative operations lead to non-determinism

First one you have observed yourself with the println statements. Second one is easily testable by using a non-associative binary operation such as subtraction:

val list = (1 to 100).toList
val a = list.par.reduce(_ - _)

println(a) 

Try running the above snippet a couple of times.

A list of integers can be mapped in parallel by a number of workers, because the elements don't depend on each other. Each worker can perform the operation in-place without affecting any other element. So even if it's perhaps not intuitive at first, such processing does benefit from the parallelization (but for an improvement to be noticeable you will probably need a larger number of elements).

However, that same list cannot be reduced in parallel with a non-associative operation, because the elements do depend on each other, and it makes a big difference whether you do:

1 - (2 - (3 - 4))

or

((1 - 2) - 3) - 4

This is why parallel processing of a collection usually supports reduce and fold, but not foldLeft and foldRight.

like image 161
slouc Avatar answered Oct 04 '22 13:10

slouc