Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala lazy parallel collection (is possible?)

is there any way to run a stream in parallel in scala without loading all objects into memory?

note: using par method, will load all objects into memory

val list = "a"::"b"::"c"::"d"::"e"::Nil //> list: List[String] = List(a, b, c, d, e)

val s = list.toStream  //> s: scala.collection.immutable.Stream[String] = Stream(a, ?)
val sq = s.par         //> sq: scala.collection.parallel.immutable.ParSeq[String] = ParVector(a, b, c, d, e)
sq.map { x => println("Map 1 "+x);x }
  .map { x => println("Map 2 "+x);x}
  .map { x => println("Map 3 "+x);x }
  .foreach { x => println("done "+x)}  
like image 802
dsncode Avatar asked Oct 17 '22 22:10

dsncode


1 Answers

In general, yes, this is possible.

As Tzach Zohar commented, the ".par" operator will eagerly load all the elements of the Stream because "streams are inherently sequential in the sense that the elements must be accessed one after the other" (see the docs)

So you can't use the built-in parallel collections for this, but you can still process a stream in parallel using ExecutionContext directly, e.g.:

import scala.concurrent._
import scala.concurrent.duration.Duration
import scala.concurrent.ExecutionContext.Implicits.global

val infStream = Stream.from(1)

val mappedInfStream = infStream
  .map { x => Future(println(s"processing $x on ${Thread.currentThread.getName}")) }

Await.result(
  Future.sequence(mappedInfStream.take(100)),
  Duration.Inf)
like image 105
Rich Avatar answered Oct 21 '22 01:10

Rich