Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit TCP connections to target behind AWS Application Load Balancer

I have an application/target behind AWS ALB and would like to place a hard cap on the number of TCP connections it will receive.

If I understand correctly, an ALB target can be either

  • Healthy -- ALB will route traffic to the target.

or

  • Unhealthy -- ALB will not route traffic to the target. Furthermore will drain/deregister/restart the target as soon as it can (I couldn't find this in the docs but this is the behavior I've observed).

Ideally I would put the target into a third state that says "Don't kill me but don't route traffic to me either" when the connection cap is reached (whereupon I would spawn more targets to meet demand).

There isn't such a third state but is there another way to place a cap on the number of connections?

like image 281
Chris Evans Avatar asked Oct 24 '16 23:10

Chris Evans


People also ask

Does application load balancer support TCP?

The AWS Classic Load Balancer (CLB) operates at Layer 4 of the OSI model. What this means is that the load balancer routes traffic between clients and backend servers based on IP address and TCP port.

How many connections can a load balancer handle?

Your load balancer uses these IP addresses to establish connections with the targets. Depending on your traffic profile, the load balancer can scale higher and consume up to a maximum of 100 IP addresses distributed across all enabled subnets.

How much traffic can an AWS load balancer handle?

Network Load Balancer currently supports 200 targets per Availability Zone. For example, if you are in two AZs, you can have up to 400 targets registered with Network Load Balancer. If cross-zone load balancing is on, then the maximum targets reduce from 200 per AZ to 200 per load balancer.

How do I restrict traffic on NLB?

You cannot allow traffic from clients to targets through the load balancer using the security groups for the clients in the security groups for the targets. Use the client CIDR blocks in the target security groups instead. Save this answer.


1 Answers

There is one main misconception on the question itself, so I'll address that first

an ALB target can be [...] Unhealthy -- ALB will not route traffic to the target. Furthermore will drain/deregister/restart the target as soon as it can (I couldn't find this in the docs but this is the behavior I've observed).

That's not what really is going on.

And ALB is a Load Balancer: it will route requests to targets, according to some routing logic that you can configure to a certain extent.

It will also perform health checks, that will be used to determine, from the ALB's perspective, whether the target is healthy or unhealthy.

Here's the misconception: the only thing that the ALB will do when a target is deemed unhealthy is that it will stop sending new requests to it. That's all.

The ALB itself doesn't have the ability to (1) deregister or (2) restart the target. In fact, on its own, it will keep performing health checks and whenever the target becomes healthy again, it will start sending traffic again.

The behavior you describe you observed is also probably not exactly what happened. You said the target was deregistered and restarted. Unless you have something incredibly custom (highly unlikely), the targets weren't restarted, but they were replaced. This is a huge difference.

Let's assume that's the behavior that was actually happening.

The reason it's happening is almost certainly that there's an AutoScaling Group integrated with the ALB (it's one of the most common designs on AWS). The AutoScaling Group can integrate health checks with the ALB (i.e., the ALB report of target health is used by ASG). When the ASG determines that an instance is unhealthy (e.g., via that integration with ALB), then the ASG proceeds to replace it, so that it maintains a number of instances in a healthy state (equal to DesiredCapacity).


Now, back to the problem — in short, there's no way at the ALB-level for you to put a hard cap in the number of connections a target will receive.

In practical terms, you need to (1) prevent that situation of saturation from happening on the first place, and (2) decide what to do when it happens.

To prevent it from happening, you need to ensure you always have enough instances to handle the current amount of traffic, as well as the projected traffic increase between when you can detect it increasing and until new instances can be launched and put in service. For example, you could use an Alarm based on the average number of connections on each target and have that trigger AutoScaling (and before going that route, it would be important to make sure that "number of connections" is really the best scaling metric). Check how fast you can put new instances in service, and check how much over-provisioning you need to maintain so that you have enough time between when you detect increase in load to when the new instances are ready.

What to do when it happens? You have mainly two general choices here:

  • your target can accept and process the request in a possibly "degraded" situation (i.e., you're processing more requests than your spec, so they all might get slower, or they may fail due to downstream issues, etc);

  • or you can quickly reject that request (but double check it isn't an ALB request! you should keep accepting and processing those) and return an error message to the caller (a.k.a. load shedding).

In either case, you should decide whether you want to "wait it out", or start a process to add new instances to handle the additional load. This decision usually comes down to determining how likely the increase in traffic is to be persistent or just a temporary, short spike.

One thing you shouldn't do is mess up with health checks for that purpose. If you reject the health check requests from the ALB, it will categorize the instance as unhealthy and, if you have an ASG (you should), the ASG will kill the instance (leading to even more load on the remaining instances while this one is replaced). Additionally, a situation of "I'm healthy but saturated" would be indistinguishable to the ALB from "I'm really having issues and I need to be replaced".

As a final note: keep in mind that an ALB isn't really dealing with "connections", but rather "requests" (i.e., it's a higher level of abstraction). What this means is that "number of connections" might now be a good metric to scale based on, as the ALB can and most likely will multiplex requests from a lot of clients into a smaller number of connections to a target. That is, if the ALB receives TCP connections from 10 different clients, it may only open 5 (or whatever other number) of connections to a target, and send requests from all 10 clients through only those 5 connections.

like image 117
Bruno Reis Avatar answered Nov 03 '22 00:11

Bruno Reis