I'm trying to use scala parallel collections to implement some cpu-intensive task, I've wanted to abstract the way the algorithm can be executed (sequentially, parallel or even distributed), but the code dosn't work as I would suspect and I have no idea what am I doing wrong.
The way I wanted to abstract this problem is mocked below:
// just measures time a block of code runs
def time(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
val stop = System.currentTimeMillis
stop - start
}
// "lengthy" task
def work = {
Thread.sleep(100)
println("done")
1
}
import scala.collection.GenSeq
abstract class ContextTransform {
def apply[T](genSeq: GenSeq[T]): GenSeq[T]
}
object ParContextTransform extends ContextTransform {
override def apply[T](genSeq: GenSeq[T]): GenSeq[T] = genSeq.par
}
// this works as expected
def callingParDirectly = {
val range = (1 to 10).par
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
// this doesn't
def callingParWithContextTransform(contextTransform: ContextTransform) = {
val range = contextTransform(1 to 10)
// make sure we really got a ParSeq
println(range)
for (i <- range) yield work
}
The result from the interpreter:
scala> time(callingParDirectly)
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res20: Long = 503
scala> time(callingParWithContextTransform(ParContextTransform))
ParRange(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
done
// ...
done
res21: Long = 1002
My first bet was that the collection doesn't split properly and the println's of "done" indeed suggest that... but the above code works well if I don't yield anything (just run the work method).
I can't understand why the callingParWithContextTransform
method doesn't work
like callingParDirectly
; what am I missing?
Possible culprit: SI-4843.
Daniel Sobral is right, this is a known bug. I can reproduce your results with Scala 2.9.1.RC3, but it's fixed in trunk. Here's a simplified version that demonstrates the slowdown:
// just measures time a block of code runs
def time(block: => Unit) : Long = {
val start = System.currentTimeMillis
block
val stop = System.currentTimeMillis
stop - start
}
// "lengthy" task
def work = {
Thread.sleep(100)
1
}
def run() {
import scala.collection.GenSeq
print("Iterating over ParRange: ")
println(time(for (i <- (1 to 10).par) yield work))
print("Iterating over GenSeq: ")
println(time(for (i <- (1 to 10).par: GenSeq[Int]) yield work))
}
run()
The output I get on 2.9.1.RC3 is
Iterating over ParRange: 202
Iterating over GenSeq: 1002
but on a nightly build of 2.10, both versions run in about 200ms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With