Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reconnecting to Remote Akka System after Restarting the Client

Tags:

java

akka

My use case is the following. An application on a machine connects to remote machines, executes scripts on them and brings back the result. I am using Akka Framework for remoting and Play Framework for the client application. The code of the server running on my remote machine is as follows :

public static void main(String[] args)
{
    OnCallServer app = new OnCallServer();
    app.executeServer();
}

private void executeServer() {
    ActorSystem system = ActorSystem.create("OnCallServer");
}

( just starts an instance of the actor system on the remote machine )

Now, when the client application wants to run a script on the remote machine, it deploys an actor on this remote system which executes the script.

The code of the actor which gets deployed is as follows :

public static class RemoteActor extends UntypedActor implements Serializable {
    private static final long serialVersionUID = 1L;

    @Override
    public void onReceive(Object message) throws Exception {
        Config config = context().system().settings().config();
        String host = config.getConfig("akka.remote.netty.ssl").getString("machineName");
        String sysDesc = host;
        if (message instanceof ScriptExecutionParams) {
            System.out.println("scriptParam");
            ScriptExecutionParams scriptParams = (ScriptExecutionParams) message;

            if (scriptParams.function == ScriptFunction.EXECUTE) {
                getSender().tell(executeScript(scriptParams.getName(), scriptParams.getArgument(), sysDesc), getSelf());
            } else if (scriptParams.function == ScriptFunction.DEPLOY) {
                getSender().tell(deployScript(scriptParams.getName(), scriptParams.getContent(), sysDesc), getSelf());
            } else if (scriptParams.function == ScriptFunction.REMOVE) {
                getSender().tell(removeScript(scriptParams.getName(), sysDesc), getSelf());
            }
        }
    }
}

( gets script parameters, performs the desired function, returns the result )

I am using TCP connection over SSL for remoting. The config is as follows :

remote {
        enabled-transports = ["akka.remote.netty.ssl"]
        netty.ssl {
            hostname = "localhost" (for client) and hostname (for remote servers)
            port = 10174 (for client) and 10175 ( for server )
            enable-ssl = true
        }
        netty.ssl.security {
            key-store = "clientKeystore.jks"
            trust-store = "clientTruststore.jks"
            key-store-password = "xxx"
            key-password = "xxx"
            trust-store-password = "xxx"
            protocol = "SSLv3"
            enabled-algorithms = [SSL_RSA_WITH_NULL_SHA]
            random-number-generator = ""
        }
    }

This setup works perfectly but sometimes the the remote machine becomes unreachable. I have noticed this happening in two cases :

  1. I restart my client application
  2. When no script is executed on the remote machine for a long time

Now the things which are confusing me are that :

  1. On the remote machine, netstat shows port 10175 is still open and listening
  2. After I restart the client application and try to execute the actor, when I check the logs of the remote machine, it shows that the actor was successfully executed on the machine, but the response was not received by my client application and hence resulted in timeout.

I have tried adding a supervisorStrategy in the client actor, but it doesn't have any effect. Am I doing something wrong ? If the TCP connection is the problem, is there a way to terminate the connection after each execution ? If the problem is Actor System shutting down if not touched for a long time, is there a config to change this ? Please ask if you need more code or information.

Update

When I try restarting the client when testing on my local machine, it doesn't give any problem. the remote server just throws akka.remote.EndpointAssociationException messages but reconnects and is able to send replies. It is only in the production mode, when the apps are deployed on separate machines that this problem arises. I think my client is getting quarantined on restart and akka.remote.quarantine-systems-for has been removed in the new Akka version.

like image 964
Aditya Pawade Avatar asked Apr 30 '14 12:04

Aditya Pawade


1 Answers

Ok, I found out the problem. For anyone else who might face this problem: In the config files of the remote machines, in the netty.ssl part of the config, I used to give their respective hostnames as I used this in the client application for connection. But in the client application config I used to give the hostname as "localhost" as I thought I would not be needing this anywhere.

Now, checking the logs in DEBUG mode, I found out that when the initial connection was established, the association was as follows:

2014-05-01 18:35:38.503UTC DEBUG[OnCallServer-akka.actor.default-dispatcher-3] Remoting - Associated [akka.ssl.tcp://[email protected]:10175] <- [akka.ssl.tcp://application@localhost:10174]

even though the client app was not on the machines localhost.. Now this session didn't give any errors. But after the connection was lost ( after restarting the client app ), and I tried re executing the script, I got the logs :

2014-05-01 18:36:12.045UTC ERROR[OnCallServer-akka.actor.default-dispatcher-2] a.r.EndpointWriter - AssociationError [akka.ssl.tcp://[email protected]:10175] -> [akka.ssl.tcp://application@localhost:10174]: Error [Association failed with [akka.ssl.tcp://application@localhost:10174]] [ akka.remote.EndpointAssociationException: Association failed with [akka.ssl.tcp://application@localhost:10174] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: localhost/127.0.0.1:10174

The server app was for some reason trying to send this message back to it's localhost.

Changing the hostname in the client config to it's actual hostname solved the problem.

like image 145
Aditya Pawade Avatar answered Nov 01 '22 13:11

Aditya Pawade