is there any way to run a stream in parallel in scala without loading all objects into memory?
note: using par method, will load all objects into memory
val list = "a"::"b"::"c"::"d"::"e"::Nil //> list: List[String] = List(a, b, c, d, e)
val s = list.toStream //> s: scala.collection.immutable.Stream[String] = Stream(a, ?)
val sq = s.par //> sq: scala.collection.parallel.immutable.ParSeq[String] = ParVector(a, b, c, d, e)
sq.map { x => println("Map 1 "+x);x }
.map { x => println("Map 2 "+x);x}
.map { x => println("Map 3 "+x);x }
.foreach { x => println("done "+x)}
In general, yes, this is possible.
As Tzach Zohar commented, the ".par" operator will eagerly load all the elements of the Stream because "streams are inherently sequential in the sense that the elements must be accessed one after the other" (see the docs)
So you can't use the built-in parallel collections for this, but you can still process a stream in parallel using ExecutionContext
directly, e.g.:
import scala.concurrent._
import scala.concurrent.duration.Duration
import scala.concurrent.ExecutionContext.Implicits.global
val infStream = Stream.from(1)
val mappedInfStream = infStream
.map { x => Future(println(s"processing $x on ${Thread.currentThread.getName}")) }
Await.result(
Future.sequence(mappedInfStream.take(100)),
Duration.Inf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With