Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get or create child Akka actor and ensure liveness

I am trying to use a hierarchy of Akka actors to handle per user state. There is a parent actor that owns all the children, and handles the get-or-create in the correct way (see a1, a2):

class UserActorRegistry extends Actor {
  override def Receive = {
    case msg@ DoPerUserWork(userId, _) =>
      val perUserActor = getOrCreateUserActor(userId)
      // perUserActor is live now, but will it receive "msg"?
      perUserActor.forward(msg)
  }

  def getOrCreateUserActor(userId: UserId): ActorRef = {
    val childName = userId.toActorName
    context.child(childName) match {
      case Some(child) => child
      case None => context.actorOf(Props(classOf[UserActor], userId), childName)
  }
}

In order to reclaim memory, the UserActors expire after a period of idleness (i.e. a timer triggers the child actor to call context.stop(self)).

My problem is that I think I have a race condition between the "getOrCreateUserActor" and the child actor receiving the forwarded message -- if the child expires in that window then the forwarded message will be lost.

Is there any way I can either detect this edge case, or refactor the UserActorRegistry to preclude it?

like image 884
Rich Avatar asked May 13 '14 10:05

Rich


1 Answers

I can see two problems with your current design that open yourself up to the race condition you mention:

1) Having the termination condition (timer sending a poison pill) go directly to the child actor. By taking this approach, the child can certainly be terminated on a separate thread (within the dispatcher) while at the same time, a message has been setup to be forwarded to it in the UserActorRegistry actor (on a different thread within the dispatcher).

2) Using a PoisonPill to terminate the child. A PoisonPill is for a graceful stop, allowing for other messages in the mailbox to be processed first. In your case, you are terminating due to inactivity, which seems to indicate no other messages already in the mailbox. I see a PoisonPill as wrong here because in your case, another message might be sent after the PosionPill and that message would surely be lost after the PoisonPill is processed.

So I'm going to suggest that you delegate the termination of the inactive children to the UserActorRegistry as opposed to doing it in the children themselves. When you detect the condition of inactivity, send a message to the instance of UserActorRegistry indicating that a particular child needs to be terminated. When you receive that message, terminate that child via stop instead of sending a PoisonPill. By using the single mailbox of the UserActorRegistry which is processed in a serial manner, you can help ensure that a child is not about to be terminated in parallel while you are about to send it a message.

Now, there is a complication here that you have to deal with. Stopping an actor is asynchronous. So if you call stop on a child, it might not be completely stopped when you are processing a DoPerUserWork message and thus might send it a message that will be lost because it's in the process of stopping. You can solve this by keeping some internal state (a List) that represents children that are in the process of being stopped. When you stop a child, add its name to that list and then setup DeathWatch (via context watch child) on it. When you receive the Terminated event for that child, remove it's name from the list of children being terminated. If you receive work for a child while its name is in that list, requeue it for re-processing, maybe up to a max number of times so as to not try and reprocess forever.

This is not a perfect solution; it's just an identification of some of the issues with your approach and a push in the right direction for solving some of them. Let me know if you want to see the code for this and I'll whip something together.

Edit

In response to your second comment. I don't think you'll be able to look at a child ActorRef and see that it's currently shutting down, thus the need for that list of children that are in the process of being shutdown. You could enhance the DoPerUserWork message to contain a numberOfAttempts:Int field and increment this and send back to self for reprocessing if you see the target child is currently shutting down. You could then use the numberOfAttempts to prevent re-queuing forever, stopping at some max number of attempts. If you don't feel completely comfortable relying on DeathWatch, you could add a time-to-live component to the items in the list of children shutting down. You could then prune items as you encounter them if they are in the list but have been in there too long.

like image 89
cmbaxter Avatar answered Nov 10 '22 11:11

cmbaxter