I have scenarios where I will need to process thousands of records at a time. Sometime, it might be in hundreds, may be upto 30000 records. I was thinking of using the scala's parallel collection. So just to understand the difference, I wrote a simple pgm like below:
object Test extends App{
val list = (1 to 100000).toList
Util.seqMap(list)
Util.parMap(list)
}
object Util{
def seqMap(list:List[Int]) = {
val start = System.currentTimeMillis
list.map(x => x + 1).toList.sum
val end = System.currentTimeMillis
println("time taken =" + (end - start))
end - start
}
def parMap(list:List[Int]) = {
val start = System.currentTimeMillis
list.par.map(x => x + 1).toList.sum
val end = System.currentTimeMillis
println("time taken=" + (end - start))
end - start
}
}
I expected that running in parallel will be faster. However, the output I was getting was
time taken =32
time taken=127
machine config :
Intel i7 processor with 8 cores
16GB RAM
64bit Windows 8
What am I doing wrong? Is this not a correct scenario for parallel mapping?
The design of Scala's parallel collections library is inspired by and deeply integrated with Scala's (sequential) collections library (introduced in 2.8). It provides a parallel counterpart to a number of important data structures from Scala's (sequential) collection library, including: ParArray. ParVector. mutable.
Scala programming language already brings us a collection to quickly implement parallel computing, the Parallel Collections. In this tutorial, we'll check out some concepts of parallelism with Scala and the usage of parallel collections.
The issue is that the operation you are performing is so fast (just adding two ints) that the overhead of doing the parallelization is more than the benefit. The parallelization only really makes sense if the operations are slower.
Think of it this way: if you had 8 friends and you gave each one an integer on a piece of paper and told them to add one, write the result down, and give it back to you, which you would record before giving them the next integer, you'd spend so much time passing messages back and forth that you could have just done all the adding yourself faster.
ALSO: Never do .par
on a List because the parallelization procedure has to copy the entire list into a parallel collection and then copy the whole thing back out. If you use a Vector, then it doesn't have to do this extra work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With