We created an Akka Cluster infrastructure for Sms, Email and Push notifications. 3 different kind of nodes are exist in the system, which are client, sender and lighthouse. Client role is being used by Web application and API application(Web and API is hosted at IIS). Lighthouse and Sender roles are being hosted as a Windows service. We are also running 4 more console applications of same windows service that in sender role.
We've been experiencing port exhaustion problems in our Web Server for about 2 weeks. Our Web Server starting to consume the ports quickly and after a while we can not do any SQL operations. Sometimes we have no choice but to do iis reset. This problems occur if there are more than one nodes that in sender role. We diagnosed it and found the source of the problem.
---------------
HOST OPEN WAIT
SRV_NOTIFICATION 3429 0
SRV_LOCAL 198 0
SRV_UNDEFINED_IPV4 23 0
SRV_DATABASE 15 0
SRV_AUTH 4 0
SRV_API 6 0
SRV_UNDEFINED_IPV6 19 0
SRV_INBOUND 12347 5
TotalPortsInUse : 17286
MaxUserPorts : 64510
TcpTimedWaitDelay : 30
03/23/2017 09:30:10
---------------
SRV_NOTIFICATION is server that lighthouse ve sender's nodes running. SRV_INBOUND is our Web Server. After checking this table, we checked what ports on the Web Server were assigned. And we got results like table below. In netstat there were more than 12000 connections like this :
TCP 192.168.1.10:65531 192.168.1.10:3564 ESTABLISHED 5716 [w3wp.exe]
TCP 192.168.1.10:65532 192.168.1.101:17527 ESTABLISHED 5716 [w3wp.exe]
TCP 192.168.1.10:65533 192.168.1.101:17527 ESTABLISHED 5716 [w3wp.exe]
TCP 192.168.1.10:65534 192.168.1.10:3564 ESTABLISHED 5716 [w3wp.exe]
192.168.1.10 Web Server 192.168.1.10:3564 API 192.168.1.101:17527 Lighthouse
The connections are opening but not closing.
After deployments our Web and Api applications are leaving and rejoining to do cluster and they configured for fixed ports. We're monitoring our cluster with application that created by @cgstevens. Even we implemented the grecaful shutdown logic for Actor System sometimes WEB and API applications cant leave the cluster so we have to remove nodes manualy and restart the actor system.
We have reproduce the problem in our development environment and recorded a video below
https://drive.google.com/file/d/0B5ZNfLACId3jMWUyOWliMUhNWTQ/view
Our hocon configuration for nodes are below :
WEB and API
<akka>
<hocon><![CDATA[
akka{
loglevel = DEBUG
actor{
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
deployment {
/coordinatorRouter {
router = round-robin-group
routees.paths = ["/user/NotificationCoordinator"]
cluster {
enabled = on
max-nr-of-instances-per-node = 1
allow-local-routees = off
use-role = sender
}
}
/decidingRouter {
router = round-robin-group
routees.paths = ["/user/NotificationDeciding"]
cluster {
enabled = on
max-nr-of-instances-per-node = 1
allow-local-routees = off
use-role = sender
}
}
}
serializers {
wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
}
serialization-bindings {
"System.Object" = wire
}
debug{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}
}
remote {
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
hostname = "192.168.1.10"
port = 3564
}
}
cluster {
seed-nodes = ["akka.tcp://[email protected]:17527"]
roles = [client]
}
}
]]>
</hocon>
</akka>
Lighthouse
<akka>
<hocon>
<![CDATA[
lighthouse{
actorsystem: "notificationSystem"
}
akka {
actor {
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
serializers {
wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
}
serialization-bindings {
"System.Object" = wire
}
}
remote {
log-remote-lifecycle-events = DEBUG
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
#will be populated with a dynamic host-name at runtime if left uncommented
#public-hostname = "192.168.1.100"
hostname = "192.168.1.101"
port = 17527
}
}
loggers = ["Akka.Logger.NLog.NLogLogger,Akka.Logger.NLog"]
cluster {
seed-nodes = ["akka.tcp://[email protected]:17527"]
roles = [lighthouse]
}
}
]]>
</hocon>
</akka>
Sender
<akka>
<hocon><![CDATA[
akka{
# stdout-loglevel = DEBUG
loglevel = DEBUG
# log-config-on-start = on
loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]
actor{
debug {
# receive = on
# autoreceive = on
# lifecycle = on
# event-stream = on
# unhandled = on
}
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
serializers {
wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
}
serialization-bindings {
"System.Object" = wire
}
deployment{
/NotificationCoordinator/LoggingCoordinator/DatabaseActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/NotificationDeciding/NotificationDecidingWorkerActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/ScheduledNotificationCoordinator/SendToProMaster/JobToProWorker{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
}
}
remote{
log-remote-lifecycle-events = DEBUG
log-received-messages = on
helios.tcp{
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
#will be populated with a dynamic host-name at runtime if left uncommented
#public-hostname = "POPULATE STATIC IP HERE"
hostname = "192.168.1.101"
port = 0
}
}
cluster {
seed-nodes = ["akka.tcp://[email protected]:17527"]
roles = [sender]
}
}
]]></hocon>
</akka>
Cluster.Monitor
<akka>
<hocon>
<![CDATA[
akka {
stdout-loglevel = INFO
loglevel = INFO
log-config-on-start = off
actor {
provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote"
serializers {
wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
}
serialization-bindings {
"System.Object" = wire
}
deployment {
/clustermanager {
dispatcher = akka.actor.synchronized-dispatcher
}
}
}
remote {
log-remote-lifecycle-events = INFO
log-received-messages = off
log-sent-messages = off
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
#will be populated with a dynamic host-name at runtime if left uncommented
#public-hostname = "127.0.0.1"
hostname = "192.168.1.101"
port = 0
}
}
cluster {
seed-nodes = ["akka.tcp://[email protected]:17527"]
roles = [ClusterManager]
client {
initial-contacts = ["akka.tcp://[email protected]:17527/system/receptionist"]
}
}
}
]]>
</hocon>
</akka>
This is a confirmed bug and probably will be fixed with CoordinatedShutdown feature in Akka.Net V1.2
https://github.com/akkadotnet/akka.net/issues/2575
You can use the latest nightly builds until 1.2 released
http://getakka.net/docs/akka-developers/nightly-builds
Edit : Akka.Net V1.2 released but this bug postponed to V1.3.
https://github.com/akkadotnet/akka.net/milestone/14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With