Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Akka single point of failure

I want to create a system that will not have a single point of failure. I was under the impression that routers are the tool for doing that but I'm not sure it works as I would expect. This is the entry point of my program :

object Main extends App{
  val system = ActorSystem("mySys", ConfigFactory.load("application"))
  val router = system.actorOf(
    ClusterRouterPool(RoundRobinPool(0), ClusterRouterPoolSettings(
      totalInstances = 2, maxInstancesPerNode = 1,
      allowLocalRoutees = false, useRole = Some("testActor"))).props(Props[TestActor]),
    name = "testActors")
}

And this is the code for running the remote ActorSystem(so the router could deploy the TestActor code to the remote nodes):

object TestActor extends App{
  val system = ActorSystem("mySys", ConfigFactory.load("application").getConfig("testactor1"))
  case object PrintRouterPath
}

I'm running this twice, once with testactor1 and once with testactor2.

TestActor code:

class TestActor extends Actor with ActorLogging{
  implicit val ExecutionContext = context.dispatcher
  context.system.scheduler.schedule(10000 milliseconds, 30000 milliseconds,self, PrintRouterPath)

  override def receive: Receive = {
    case PrintRouterPath =>
     log.info(s"router is on path ${context.parent}")
  }
}

And application.conf

akka{
actor {
  provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
  log-remote-lifecycle-events = off
  netty.tcp {
    hostname = "127.0.0.1"
    port = 2552
  }
}
cluster {
  seed-nodes = [
    "akka.tcp://[email protected]:2552"
    "akka.tcp://[email protected]:2553"
    "akka.tcp://[email protected]:2554"]
  auto-down-unreachable-after = 20s
  }
}
testactor1{
  akka{
    actor {
      provider = "akka.cluster.ClusterActorRefProvider"
    }
    remote {
      log-remote-lifecycle-events = off
      netty.tcp {
        hostname = "127.0.0.1"
        port = 2554
      }
    }
    cluster {
    roles.1 = "testActor"
      seed-nodes = [
        "akka.tcp://[email protected]:2552"
        "akka.tcp://[email protected]:2553"
        "akka.tcp://[email protected]:2554"]
      auto-down-unreachable-after = 20s
    }
  }
}
testactor2{
  akka{
    actor {
      provider = "akka.cluster.ClusterActorRefProvider"
    }
    remote {
      log-remote-lifecycle-events = off
      netty.tcp {
        hostname = "127.0.0.1"
        port = 2553
      }
    }
    cluster {
    roles.1 = "testActor"
      seed-nodes = [
        "akka.tcp://[email protected]:2552"
        "akka.tcp://[email protected]:2553"
        "akka.tcp://[email protected]:2554"]
      auto-down-unreachable-after = 20s
    }
  }
}

Now the problem is that when the process that started the router is killed, the actors that are running the code of TestActor, are not receiving any messages(the messages that the scheduler sends), I would have expect that the router will be deployed on another seed node in the cluster and the actors will be recovered. Is this possible? or is there any other way of implementing this flow and not having a single point of failure?

like image 792
user_s Avatar asked Oct 29 '22 13:10

user_s


1 Answers

I think that, by deploying the router on only one node you are setting up a master-slave cluster, where the master is a single point of failure by definition.

From what I understand (looking at the docs), a router can be cluster-aware in the sense that it can deploy (pool mode) or lookup (group mode) routees on nodes in the cluster. The router itself will not react to failure by spawning somewhere else in the cluster.

I believe you have 2 options:

  1. make use of multiple routers to make you system more fault-tolerant. Routees can either be shared (group mode) or not (pool mode) between routers.

  2. make use of the Cluster Singleton pattern - which allows for a master-slave configuration where the master will be automatically re-spawned in case of failure. In relation to your example, note that this behaviour is achieved by having an actor (ClusterSingletonManager) deployed in each node. This actor has the purpose of working out if the chosen master needs to be respawned and where. None of this logic is in place in case of cluster-aware router like the one you setup.

You can find examples of multiple cluster setups in this Activator sample.

like image 80
Stefano Bonetti Avatar answered Nov 11 '22 09:11

Stefano Bonetti