Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Akka scheduling patterns

Tags:

scala

akka

Consider classical "Word Count" program. It counts number of words in all files in some directory. Master receives some directory and splits job among Worker actors (each worker works with one file). This is pseudo-code:

class WordCountWorker extends Actor {

  def receive = {
    case FileToCount(fileName:String) =>
      val count = countWords(fileName)
      sender ! WordCount(fileName, count)
  }
}

class WordCountMaster extends Actor {
  def receive = {
    case StartCounting(docRoot) => // sending each file to worker
      val workers = createWorkers()
      fileNames = scanFiles(docRoot)
      sendToWorkers(fileNames, workers)
    case WordCount(fileName, count) => // aggregating results
      ...

  }
}

But I want to run this Word Count program by schedule ( for example every 1 minute), providing different directories to scan.

And Akka provides nice way for scheduling message passing:

system.scheduler.schedule(0.seconds, 1.minute, wordCountMaster , StartCounting(directoryName))

But the problem with above scheduler starts when scheduler sends new message by tick, but previous message is not yet processed (for example I sent message to scan some big directory, and after 1 second I sent another message to scan another directory, so operation of processing of 1st directory is not completed yet). So as a result my WordCountMaster will receive WordCount messages from workers which are processing different directories.

As a workaround instead of scheduling message sending, I can schedule execution of some code block, which will create every time new WordCountMaster. I.e. one directory = one WordCountMaster. But I think it's inefficient, and also I need care about providing unique names for WordCountMaster to avoid InvalidActorNameException.

So my question is: should I create new WordCountMaster for each tick as I mentioned in above paragraph? Or there some better ideas/patterns how to redesign this program to support scheduling?


Some update: In case of creating one Master actor per directory, I have some problems:

  1. Problem with naming actors

InvalidActorNameException: actor name [WordCountMaster] is not unique!

and

InvalidActorNameException: actor name [WordCountWorker ] is not unique!

I can overcome this problem just not providing actor name. But in this case my actors receives auto-generated names, like $a, $b etc. It's not good for me.

  1. Problem with config:

I want to exclude configuration of my routers to application.conf. I.e. I want to provide same configuration to each WordCountWorker router. But since I'm not controlling actor names I can't use configuration below because I don't know actor names:

  /wordCountWorker{
    router = smallest-mailbox-pool
    nr-of-instances = 5
    dispatcher = word-counter-dispatcher
  }
like image 908
WelcomeTo Avatar asked May 25 '15 21:05

WelcomeTo


People also ask

What is an Akka cluster?

Akka Cluster provides a fault-tolerant decentralized peer-to-peer based Cluster Membership Service with no single point of failure or single point of bottleneck. It does this using gossip protocols and an automatic failure detector.

What is Actor system in Akka?

What is an Actor in Akka? An actor is essentially nothing more than an object that receives messages and takes actions to handle them. It is decoupled from the source of the message and its only responsibility is to properly recognize the type of message it has received and take action accordingly.

What is Akka protocol?

Interacting with an Actor in Akka is done through an ActorRef[T] where T is the type of messages the actor accepts, also known as the “protocol”. This ensures that only the right kind of messages can be sent to an actor and also that no one else but the Actor itself can access the Actor instance internals.


1 Answers

I am not an Akka expert, but I think the approach of having an actor per aggregation is not inefficient. You need to keep the concurrent aggregations separaeted somehow. You can either give every aggregation an id so keep them separated by the id in the one and only master actor, or you can use the Akka actor naming and live cycle logic, and delegate every aggregation for every counting round to an actor that will live just for that aggregation logic.

For me the usage of one actor per aggregation seems to be more elegant.

Also please note that Akka has an implementation for the aggregation pattern as described here

like image 186
Gregor Raýman Avatar answered Sep 29 '22 00:09

Gregor Raýman