Can't understand what's the right way to compute elements of list in parallel, but block main thread while elements are not computed (in parallel). Use case: i have a list of URL links and a simple parser for html page, i what to reduce the amount of time needed to grab info from the given pages by parsing each page in parallel and then return a simple list with some JSON data.
As i understand i have two options:
Concurrent way with Futures
I have a method with extract some JSON data in Future:
def extractData(link: String): Future[JValue] = // some implementation
and i just map it over a list of links, which type would be List[Future[JValue]]:
val res: List[Future[JValue]] = listOfLink.map(extractData)
If i call sequence (for example from Scalaz, or my own implementation) which traverse this list and convert it to Future[List[JValue]], then links still gonna be processed sequentially, but a separate thread, which won't give me any efficiency, cause in result i need to get a List[JValue].
Try to compute with ParSeq
In this option i have a function which just extracts data:
def extractData(link: String): JValue = // some implementation
but this time call .par on the collection:
val res: ParSeq[JValue] = listOfLinks.map(extractData)
But in this way i don't quite understand how to block main thread while the hole list won't be computed, without parsing each link sequentially
As for the Akka, i just can't use actors here, so only Future or Par*
The links will be processed in parallel when you map extractData over the collection. Consider a slightly simplified example:
import scala.concurrent._
import ExecutionContext.Implicits.global
def extractData(s: String) = future {
printf("Starting: %s\n", s)
val i = s.toInt
printf("Done: %s\n", s)
i
}
val xs = (0 to 5).map(_.toString).toList
val parsed = Future.sequence(xs map extractData)
Now you'll see something like the following, which makes it clear that these things aren't being processed sequentially:
Starting: 0
Done: 0
Starting: 2
Done: 2
Starting: 1
Starting: 4
Done: 1
Starting: 3
Starting: 5
Done: 5
Done: 4
Done: 3
Note that you can use Future.traverse to avoid creating the intermediate list of futures:
val parsed = Future.traverse(xs)(extractData)
In either case you can block with Await:
val res = Await.result(parsed, duration.Duration.Inf)
As a footnote: I don't know if you're planning to use Dispatch to perform the HTTP requests, but if not, it's worth a look. It also provides nicely integrated JSON parsing, and the documentation is full of useful examples of how to work with futures.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With