Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Port exhaustion issue with Akka.Cluster

We created an Akka Cluster infrastructure for Sms, Email and Push notifications. 3 different kind of nodes are exist in the system, which are client, sender and lighthouse. Client role is being used by Web application and API application(Web and API is hosted at IIS). Lighthouse and Sender roles are being hosted as a Windows service. We are also running 4 more console applications of same windows service that in sender role.

We've been experiencing port exhaustion problems in our Web Server for about 2 weeks. Our Web Server starting to consume the ports quickly and after a while we can not do any SQL operations. Sometimes we have no choice but to do iis reset. This problems occur if there are more than one nodes that in sender role. We diagnosed it and found the source of the problem.

---------------
HOST                  OPEN    WAIT
SRV_NOTIFICATION      3429    0
SRV_LOCAL             198     0
SRV_UNDEFINED_IPV4    23      0
SRV_DATABASE          15      0
SRV_AUTH              4       0
SRV_API               6       0
SRV_UNDEFINED_IPV6    19      0
SRV_INBOUND           12347   5

TotalPortsInUse   : 17286
MaxUserPorts      : 64510
TcpTimedWaitDelay : 30
03/23/2017 09:30:10
---------------

SRV_NOTIFICATION is server that lighthouse ve sender's nodes running. SRV_INBOUND is our Web Server. After checking this table, we checked what ports on the Web Server were assigned. And we got results like table below. In netstat there were more than 12000 connections like this :

TCP    192.168.1.10:65531     192.168.1.10:3564      ESTABLISHED     5716   [w3wp.exe]
TCP    192.168.1.10:65532     192.168.1.101:17527    ESTABLISHED     5716   [w3wp.exe]
TCP    192.168.1.10:65533     192.168.1.101:17527    ESTABLISHED     5716   [w3wp.exe]
TCP    192.168.1.10:65534     192.168.1.10:3564      ESTABLISHED     5716   [w3wp.exe]

192.168.1.10 Web Server 192.168.1.10:3564 API 192.168.1.101:17527 Lighthouse

The connections are opening but not closing.

After deployments our Web and Api applications are leaving and rejoining to do cluster and they configured for fixed ports. We're monitoring our cluster with application that created by @cgstevens. Even we implemented the grecaful shutdown logic for Actor System sometimes WEB and API applications cant leave the cluster so we have to remove nodes manualy and restart the actor system.

We have reproduce the problem in our development environment and recorded a video below

https://drive.google.com/file/d/0B5ZNfLACId3jMWUyOWliMUhNWTQ/view

Our hocon configuration for nodes are below :

WEB and API

<akka>
    <hocon><![CDATA[
            akka{
                loglevel = DEBUG

                actor{
                    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"

                    deployment {
                        /coordinatorRouter {
                            router = round-robin-group
                            routees.paths = ["/user/NotificationCoordinator"]
                            cluster {
                                    enabled = on
                                    max-nr-of-instances-per-node = 1
                                    allow-local-routees = off
                                    use-role = sender
                            }
                        }

                        /decidingRouter {
                            router = round-robin-group
                            routees.paths = ["/user/NotificationDeciding"]
                            cluster {
                                    enabled = on
                                    max-nr-of-instances-per-node = 1
                                    allow-local-routees = off
                                    use-role = sender
                            }
                        }
                    }

                    serializers {
                            wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
                    }

                    serialization-bindings {
                     "System.Object" = wire
                    }

                    debug{
                        receive = on
                        autoreceive = on
                        lifecycle = on
                        event-stream = on
                        unhandled = on
                    }
                }

                remote {
                    helios.tcp {
                            transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                            applied-adapters = []
                            transport-protocol = tcp
                            hostname = "192.168.1.10"
                            port = 3564
                    }
                }

                cluster {
                        seed-nodes = ["akka.tcp://[email protected]:17527"]
                        roles = [client]
                }
            }
        ]]>
    </hocon>
</akka>

Lighthouse

<akka>
    <hocon>
        <![CDATA[
                lighthouse{
                        actorsystem: "notificationSystem"
                    }

                akka {
                    actor { 
                        provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"

                        serializers {
                            wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
                        }

                        serialization-bindings {
                            "System.Object" = wire
                        }
                    }

                    remote {
                        log-remote-lifecycle-events = DEBUG
                        helios.tcp {
                            transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                            applied-adapters = []
                            transport-protocol = tcp
                            #will be populated with a dynamic host-name at runtime if left uncommented
                            #public-hostname = "192.168.1.100"
                            hostname = "192.168.1.101"
                            port = 17527
                        }
                    }            

                    loggers = ["Akka.Logger.NLog.NLogLogger,Akka.Logger.NLog"]

                    cluster {
                        seed-nodes = ["akka.tcp://[email protected]:17527"]
                        roles = [lighthouse]
                    }
                }
        ]]>
    </hocon>
</akka>

Sender

<akka>
    <hocon><![CDATA[
                akka{
                    # stdout-loglevel = DEBUG
                    loglevel = DEBUG
                    # log-config-on-start = on

                    loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]

                    actor{
                        debug {  
                            # receive = on 
                            # autoreceive = on
                            # lifecycle = on
                            # event-stream = on
                            # unhandled = on
                        }         

                        provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"           

                        serializers {
                            wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
                        }

                        serialization-bindings {
                         "System.Object" = wire
                        }

                        deployment{                         
                            /NotificationCoordinator/LoggingCoordinator/DatabaseActor{
                                router = round-robin-pool
                                resizer{
                                    enabled = on
                                    lower-bound = 3
                                    upper-bound = 5
                                }
                            }                           

                            /NotificationDeciding/NotificationDecidingWorkerActor{
                                router = round-robin-pool
                                resizer{
                                    enabled = on
                                    lower-bound = 3
                                    upper-bound = 5
                                }
                            }

                            /ScheduledNotificationCoordinator/SendToProMaster/JobToProWorker{
                                router = round-robin-pool
                                resizer{
                                    enabled = on
                                    lower-bound = 3
                                    upper-bound = 5
                                }
                            }
                        }
                    }

                 remote{                            
                            log-remote-lifecycle-events = DEBUG
                            log-received-messages = on

                            helios.tcp{
                                transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                                applied-adapters = []
                                transport-protocol = tcp
                                #will be populated with a dynamic host-name at runtime if left uncommented
                                #public-hostname = "POPULATE STATIC IP HERE"
                                hostname = "192.168.1.101"
                                port = 0
                        }
                    }

                    cluster {
                        seed-nodes = ["akka.tcp://[email protected]:17527"]
                        roles = [sender]
                    }
                }
            ]]></hocon>
</akka>

Cluster.Monitor

<akka>
    <hocon>
        <![CDATA[
                akka {
                    stdout-loglevel = INFO
                    loglevel = INFO
                    log-config-on-start = off 

                    actor {
                        provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote"                

                        serializers {
                            wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
                        }
                        serialization-bindings {
                            "System.Object" = wire
                        }

                        deployment {                                
                            /clustermanager {
                                dispatcher = akka.actor.synchronized-dispatcher
                            }
                        }
                    }

                    remote {
                        log-remote-lifecycle-events = INFO
                        log-received-messages = off
                        log-sent-messages = off

                        helios.tcp {                                
                            transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                            applied-adapters = []
                            transport-protocol = tcp
                            #will be populated with a dynamic host-name at runtime if left uncommented
                            #public-hostname = "127.0.0.1"
                            hostname = "192.168.1.101"
                            port = 0
                        }
                    }            

                    cluster {                           
                    seed-nodes = ["akka.tcp://[email protected]:17527"]
                        roles = [ClusterManager]

                        client {
                            initial-contacts = ["akka.tcp://[email protected]:17527/system/receptionist"]
                        }
                    }
                }
        ]]>
    </hocon>
</akka>
like image 384
Deniz İrgin Avatar asked Mar 30 '17 20:03

Deniz İrgin


1 Answers

This is a confirmed bug and probably will be fixed with CoordinatedShutdown feature in Akka.Net V1.2

https://github.com/akkadotnet/akka.net/issues/2575

You can use the latest nightly builds until 1.2 released

http://getakka.net/docs/akka-developers/nightly-builds

Edit : Akka.Net V1.2 released but this bug postponed to V1.3.

https://github.com/akkadotnet/akka.net/milestone/14

like image 127
Deniz İrgin Avatar answered Nov 08 '22 09:11

Deniz İrgin