Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributing Scala over a cluster?

So I've recently started learning Scala and have been using graphs as sort of my project-to-improve-my-Scala, and it's going well - I've since managed to easily parallelize some graph algorithms (that benefit from data parallelization) courtesy of Scala 2.9's amazing support for parallel collections.

However, I want to take this one step further and have it parallelized not just on a single machine but across several. Does Scala offer any clean way to do this like it does with parallel collections, or will I have to wait until I get to the chapter in my book on Actors/learn more about Akka?

Thanks! -kstruct

like image 448
adelbertc Avatar asked Mar 11 '12 04:03

adelbertc


2 Answers

There was an attempt of creating distributed collections (currently project is frozen).

Alternatives would be Akka (which recently got really cool addition: Akka Cluster), that you've already mentioned, or full-fledged cluster engines, that is not parallel collections in any sense and more like distributing cluster over the scala but could be used in your task in some way - such as Scoobi for Hadoop, Storm or even Spark (specifically, Bagel for graph processing). There is also Swarm that was build on top of delimited continuations. Last but not least is Menthor - authors claiming that it is especially fits graph processing and makes use of Actors.

Since you're aiming to work with graphs you may also consider to look at Cassovary that was recently opensourced by twitter.

Signal-collect is a framework for parallel dataprocessing backed with Akka.

like image 63
om-nom-nom Avatar answered Oct 23 '22 07:10

om-nom-nom


You can use Akka ( http://akka.io ) - it has always been the most advanced and powerful actor and concurrency framework for Scala, and the fresh-baked version 2.0 allows for nice transparent actor remoting, hierarchies and supervision. The canonical way to do parallel computations is to create as many actors as there are parallel parts in your algorithm, optionally spreading them over several machines, send them data to process and then gather the results (see here).

like image 2
Oleg Kunov Avatar answered Oct 23 '22 07:10

Oleg Kunov