I have 50,000 tasks and want to execute them with 10 threads. In Java I should create Executers.threadPool(10) and pass runnable to is then wait to process all. Scala as I understand especially useful for that task, but I can't find solution in docs.
Multithreading is used to develop concurrent applications in Scala. Threads in Scala can be created by using two mechanisms : Extending the Thread class. Extending the Runnable Interface.
In computing, multitasking is the concurrent execution of multiple tasks (also known as processes) over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end.
Tasks uses the Thread Pool behind the scenes but make better use of the threads depending on the number in use. The main object in the TPL is a Task. This is a class that represents an asynchronous operation. The commonest way to start things running is with the Task.
THe simplest approach is to use the scala.concurrent.Future
class and associated infrastructure. The scala.concurrent.future
method asynchronously evaluates the block passed to it and immediately returns a Future[A]
representing the asynchronous computation. Futures can be manipulated in a number of non-blocking ways, including mapping, flatMapping, filtering, recovering errors, etc.
For example, here's a sample that creates 10 tasks, where each tasks sleeps an arbitrary amount of time and then returns the square of the value passed to it.
import scala.concurrent.duration._ import scala.concurrent.ExecutionContext.Implicits.global val tasks: Seq[Future[Int]] = for (i <- 1 to 10) yield future { println("Executing task " + i) Thread.sleep(i * 1000L) i * i } val aggregated: Future[Seq[Int]] = Future.sequence(tasks) val squares: Seq[Int] = Await.result(aggregated, 15.seconds) println("Squares: " + squares)
In this example, we first create a sequence of individual asynchronous tasks that, when complete, provide an int. We then use Future.sequence
to combine those async tasks in to a single async task -- swapping the position of the Future
and the Seq
in the type. Finally, we block the current thread for up to 15 seconds while waiting for the result. In the example, we use the global execution context, which is backed by a fork/join thread pool. For non-trivial examples, you probably would want to use an application specific ExecutionContext
.
Generally, blocking should be avoided when at all possible. There are other combinators available on the Future
class that can help program in an asynchronous style, including onSuccess
, onFailure
, and onComplete
.
Also, consider investigating the Akka library, which provides actor-based concurrency for Scala and Java, and interoperates with scala.concurrent
.
This simplest approach is to use Scala's Future class, which is a sub-component of the Actors framework. The scala.actors.Futures.future method creates a Future for the block passed to it. You can then use scala.actors.Futures.awaitAll to wait for all tasks to complete.
For example, here's a sample that creates 10 tasks, where each tasks sleeps an arbitrary amount of time and then returns the square of the value passed to it.
import scala.actors.Futures._ val tasks = for (i <- 1 to 10) yield future { println("Executing task " + i) Thread.sleep(i * 1000L) i * i } val squares = awaitAll(20000L, tasks: _*) println("Squares: " + squares)
You want to look at either the Scala actors library or Akka. Akka has cleaner syntax, but either will do the trick.
So it sounds like you need to create a pool of actors that know how to process your tasks. An Actor can basically be any class with a receive method - from the Akka tutorial (http://doc.akkasource.org/tutorial-chat-server-scala):
class MyActor extends Actor { def receive = { case "test" => println("received test") case _ => println("received unknown message") }} val myActor = Actor.actorOf[MyActor] myActor.start
You'll want to create a pool of actor instances and fire your tasks to them as messages. Here's a post on Akka actor pooling that may be helpful: http://vasilrem.com/blog/software-development/flexible-load-balancing-with-akka-in-scala/
In your case, one actor per task may be appropriate (actors are extremely lightweight compared to threads so you can have a LOT of them in a single VM), or you might need some more sophisticated load balancing between them.
EDIT: Using the example actor above, sending it a message is as easy as this:
myActor ! "test"
The actor will then output "received test" to standard output.
Messages can be of any type, and when combined with Scala's pattern matching, you have a powerful pattern for building flexible concurrent applications.
In general Akka actors will "do the right thing" in terms of thread sharing, and for the OP's needs, I imagine the defaults are fine. But if you need to, you can set the dispatcher the actor should use to one of several types:
* Thread-based * Event-based * Work-stealing * HawtDispatch-based event-driven
It's trivial to set a dispatcher for an actor:
class MyActor extends Actor { self.dispatcher = Dispatchers.newExecutorBasedEventDrivenDispatcher("thread-pool-dispatch") .withNewThreadPoolWithBoundedBlockingQueue(100) .setCorePoolSize(10) .setMaxPoolSize(10) .setKeepAliveTimeInMillis(10000) .build }
See http://doc.akkasource.org/dispatchers-scala
In this way, you could limit the thread pool size, but again, the original use case could probably be satisfied with 50K Akka actor instances using default dispatchers and it would parallelize nicely.
This really only scratches the surface of what Akka can do. It brings a lot of what Erlang offers to the Scala language. Actors can monitor other actors and restart them, creating self-healing applications. Akka also provides Software Transactional Memory and many other features. It's arguably the "killer app" or "killer framework" for Scala.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With