Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to diagnose ECS Fargate task failing to start?

I'm trying to launch/run a Dockerfile on AWS using their ECS service. I can run my docker image locally just fine, but it's failing on the Fargate launch type. I've uploaded my Docker image to ECR, and I've created a cluster/service/task from it.

However, my cluster's task status simply reads "DEPROVISIONING (Task failed to start)", but it provides no logs or details of the output of my running image, so I have no idea what's wrong. How do I find more information and diagnose why ECS isn't able to run my image?

like image 691
Cerin Avatar asked May 20 '19 22:05

Cerin


People also ask

Why has my ECS stopped working?

If you have trouble starting a task, your task might be stopping because of an error. For example, you run the task and the task displays a PENDING status and then disappears. You can view stopped task errors like this in the Amazon ECS console by viewing the stopped task and inspecting it for error messages.

Why is my Amazon ECS Task stuck in the pending state?

Some common scenarios that can cause your ECS task to be stuck in the PENDING state include the following: The Docker daemon is unresponsive. The Docker image is large. The Amazon ECS container agent lost connectivity with the Amazon ECS service in the middle of a task launch.

How long does a fargate task take to start?

For example, if you run a batch job with 1,200 On-Demand tasks, you can now launch your job in under a minute, while previously it would have taken about 20 minutes. Similarly, EKS Fargate customers will now observe up to 20X faster scaling when using the Platform Versions referenced in the release notes.


3 Answers

Please go Clusters > Tasks > Details > Containers

You could see some error message around the red rectangle in the figure "error message."

Task detail:

task detail

Error message:

error message

like image 163
Yasu Avatar answered Oct 10 '22 06:10

Yasu


As Abhinav says, the message isn't very descriptive (and using the CLI aws ecs describe-tasks doesn't add anything more). The only possibility is to log into the host EC2 instance and read the logs there, or send those logs to CloudWatch https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html#cwlogs_user_data

The mostly likely cause (in ECS) is that the cluster doesn't have enough resources to launch the new task. You can sometimes work out the cause from the Metrics tab, or since mid-2019 (depending on your region I guess) you can enable "CloudWatch Container Insights" from ECS Account Settings to get more detailed information about memory and CPU reservations.

like image 1
andrew lorien Avatar answered Oct 10 '22 04:10

andrew lorien


I may be late to the party, but you can check the container logs instead of the tasks'.

Go to the failed task -> Details -> Container (at the bottom) and open it. Right under details you'll see a Status reason.

Opening the container details Opening the container

Getting the reason for failureenter image description here

Note: if your task runs more than one container, check the 'Status reason' of each container as per the screenshot above, as it can be different between them.

like image 14
Radu Diță Avatar answered Oct 10 '22 06:10

Radu Diță