Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Network load balancer - What is client reset count (and why is it high)

The documentation for the various client/target/elb reset count metrics (TCP_Client_Reset_Count, TCP_Target_Reset_Count, TCP_ELB_Reset_Count) just says they count RST packets. I tried to understand what a RST packet is, and it seems to have to do with broken TCP connections. My load balancer has a single, long-term, seemingly successful client connection. Why do I see on the order of 100 client resets per hour? I also see about 10 load balancer resets per hour, and 0 target resets.

EDIT: I just observed that increasing the size of the server instance (I'm using Farscape--increased 0.25 vCPU to 0.5) led to a 10-fold reduction in client resets per hour. The number of load balancer resets did not change.

like image 665
Aleksandr Dubinsky Avatar asked Mar 28 '18 17:03

Aleksandr Dubinsky


People also ask

What is target reset count?

The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer.

Why does ELB has more than 1 IP address?

All the load balancers of a ELB registers their IP addresses on the DNS service at Amazon's side. So for different queries, Amazon will return different IP addresses. This is why ELB only has a DNS name instead of a static IP address.

What is the idle timeout set on NLB for UDP flow?

Elastic Load Balancing sets the idle timeout value for UDP flows to 120 seconds.


2 Answers

My hunch is that this is related to a bug in the Network Load Balancer that causes it to send 100x as many health checks as it should. See: NLB Target Group health checks are out of control My theory is that a bug causes the health check connection to be broken in an unclean way if the target instance is not quick enough. These broken health check connections get reported as "client resets" even though they should be reported as "ELB resets" or not reported at all.

like image 189
Aleksandr Dubinsky Avatar answered Oct 31 '22 02:10

Aleksandr Dubinsky


There are many reasons for an TCP RST to be sent. Some are not normal, meaning errors, and some are normal connection cleanups that the TCP/IP stack or application performs.

An example of a normal TCP RST would be a long lived connection that exceeds some time limit imposed by one side or the other. Once the time limit is exceeded the connection can be "forceably" closed which will generate the RST.

An example of a not normal TCP RST would be an application that abruptly disconnected due to an internal error.

A poorly written application can also cause TCP RST when it does not perform graceful shutdowns on the TCP socket before closing the connection.

I will guess that the behavior you are seeing is not a problem. However, to really know, you will need to do a wire trace and protocol analysis on each connection to determine exactly what is happening.

like image 31
John Hanley Avatar answered Oct 31 '22 04:10

John Hanley