Performance of scala parallel collection processing

Tags:

I have scenarios where I will need to process thousands of records at a time. Sometime, it might be in hundreds, may be upto 30000 records. I was thinking of using the scala's parallel collection. So just to understand the difference, I wrote a simple pgm like below:

object Test extends App{
  val list = (1 to 100000).toList
  Util.seqMap(list)
  Util.parMap(list)
}

object Util{
  def seqMap(list:List[Int]) = {
    val start = System.currentTimeMillis
    list.map(x => x + 1).toList.sum
    val end = System.currentTimeMillis
    println("time taken =" + (end - start))
    end - start
  }
  def parMap(list:List[Int]) = {
    val start = System.currentTimeMillis
    list.par.map(x => x + 1).toList.sum
    val end = System.currentTimeMillis
    println("time taken=" + (end - start))
    end - start
  }
}

I expected that running in parallel will be faster. However, the output I was getting was

time taken =32
time taken=127

machine config :

Intel i7 processor with 8 cores
16GB RAM
64bit Windows 8

What am I doing wrong? Is this not a correct scenario for parallel mapping?

234

asked Feb 13 '15 10:02

Yadu Krishnan

1 Answers

The issue is that the operation you are performing is so fast (just adding two ints) that the overhead of doing the parallelization is more than the benefit. The parallelization only really makes sense if the operations are slower.

Think of it this way: if you had 8 friends and you gave each one an integer on a piece of paper and told them to add one, write the result down, and give it back to you, which you would record before giving them the next integer, you'd spend so much time passing messages back and forth that you could have just done all the adding yourself faster.

ALSO: Never do .par on a List because the parallelization procedure has to copy the entire list into a parallel collection and then copy the whole thing back out. If you use a Vector, then it doesn't have to do this extra work.

102

answered Sep 24 '22 21:09

dhg

Related questions
                            
                                Why does Scala implement for as a closure?
                            
                                Scala mismatch while mapping Map
                            
                                Handling container stop/reload event
                            
                                Better way to access tuple(other than match case)
                            
                                Multi-Option type in Scala
                            
                                Why do Scala parallel collections sometimes cause an OutOfMemoryError?
                            
                                scala hashmap multiple values
                            
                                How to replace(fill) None entries on List of Options from another List using idiomatic Scala?
                            
                                Best Scala collection type for vectorized numerical computing
                            
                                How to start a Scala akka actor
                            
                                Polymorphic instantiation in Scala using TypeTag and ClassTag
                            
                                Generalize list combinations to N lists
                            
                                Using scala map in Java
                            
                                Why is it possible to declare variable with same name in the REPL?
                            
                                Convert match statement to partial function when foreach is used
                            
                                Compile error when using a companion object of a case class as a type parameter
                            
                                Batch insert with table that has many columns using Anorm
                            
                                update the last element of List
                            
                                Scala "def" method declaration: Colon vs equals
                            
                                How to initialize trait's vals in subtrait?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Performance of scala parallel collection processing

Tags:

parallel-processing

scala

scala-collections

Yadu Krishnan

People also ask

1 Answers

dhg

Recent Activity

Donate For Us