Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS ECS 503 Service Temporarily Unavailable while deploying

I am using Amazon Web Services EC2 Container Service with an Application Load Balancer for my app. When I deploy a new version, I get 503 Service Temporarily Unavailable for about 2 minutes. It is a bit more than the startup time of my application. This means that I cannot do a zero-downtime deployment now.

Is there a setting to not use the new tasks while they are starting up? Or what am I missing here?

UPDATE:

The health check numbers for the target group of the ALB are the following:

Healthy threshold:     5
Unhealthy threshold:   2
Timeout:               5 seconds
Interval:              30 seconds
Success codes:         200 OK

Healthy threshold is 'The number of consecutive health checks successes required before considering an unhealthy target healthy'
Unhealthy threshold is 'The number of consecutive health check failures required before considering a target unhealthy.'
Timeout is 'The amount of time, in seconds, during which no response means a failed health check.'
Interval is 'The approximate amount of time between health checks of an individual target'

UPDATE 2: So, my cluster consists of two EC2 instances, but can scale up if needed. The desired and minimum count is 2. I run one task per instance, because my app needs a specific port number. Before I deploy (jenkins runs an aws cli script) I set the number of instances to 4. Without this, AWS cannot deploy my new tasks (this is another issue to solve). Networking mode is bridge.

like image 951
vargen_ Avatar asked Jul 05 '17 15:07

vargen_


People also ask

How do I fix Error 503 on AWS?

Open the Amazon EC2 console. On the navigation pane, under Auto Scaling, choose Auto Scaling Groups. Choose the Auto Scaling group that you want to verify. Under Load balancing, confirm that the Target Group of the Application Load Balancer is associated with the Auto Scaling Group.

What is 503 service temporarily unavailable?

The HyperText Transfer Protocol (HTTP) 503 Service Unavailable server error response code indicates that the server is not ready to handle the request. Common causes are a server that is down for maintenance or that is overloaded.


2 Answers

So, the issue seems to lie in the port mappings of my container settings in the task definition. Before I was using 80 as host and 8080 as container port. I thought I need to use these, but the host port can be any value actually. If you set it to 0 then ECS will assign a port in the range of 32768-61000 and thus it is possible to add multiple tasks to one instance. In order for this to work, I also needed to change my security group letting traffic come from the ALB to the instances on these ports.
So, when ECS can run multiple tasks on the same instance, the 50/200 min/max healthy percent makes sense and it is possible to do a deploy of new task revision without the need of adding new instances. This also ensures the zero-downtime deployment.

Thank you for everybody who asked or commented!

like image 91
vargen_ Avatar answered Oct 20 '22 22:10

vargen_


Since you are using AWS ECS may I ask what is the service's "minimum health percent" and "maximum health percent"

Make sure that you have "maximum health percent" of 200 and "minimum health percent" of 50 so that during deployment not all of your services go down.

Please find the documentation definition of these two terms:

Maximum percent provides an upper limit on the number of running tasks during a deployment enabling you to define the deployment batch size.

Minimum healthy percent provides a lower limit on the number of running tasks during a deployment enabling you to deploy without using additional cluster capacity.

A limit of 50 for "minimum health percent" will make sure that only half of your services container gets killed before deploying the new version of the container, i.e. if the desired task value of the service is "2" than at the time of deployment only "1" container with old version will get killed first and once the new version is deployed the second old container will get killed and a new version container deployed. This will make sure that at any given time there are services handling the request.

Similarly a limit of 200 for "maximum health percent" tells the ecs-agent that at a given time during deployment the service's container can shoot up to a maximum of double of the desired task.

Please let me know in case of any further question.

like image 6
Manish Joshi Avatar answered Oct 20 '22 22:10

Manish Joshi