This is an odd one. I have an ECS service using Fargate v1.4 in a private subnet. Since the tasks don't have access to the Internet, I had to configure VPC Endpoints so that tasks could load what they needed from AWS services (e.g. secrets from SSM, the image from ECR, etc.). This was all and good and worked just fine, until it didn't. I'm not sure what changed, but one weekend I noticed my servers weren't running anymore and I noticed this error in the console:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secrets from ssm: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://ssm.us-ea...
That looked familiar from when I was configuring the VPC endpoints, so I went through the console to make sure nothing changed. As far as I can tell, the configuration looks right (security groups have the proper ingress/egress rules, proper endpoints are configured and connected to the VPC my servers are in, everything is in the same AZ, IAM roles have access to the secret).
As an experiment, I removed the secrets I was trying to load from the task definition to see what would happen. When a new server spun up, I saw a similar error, but this time for loading the image from ECR:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://api.ecr....
I also tried to delete and recreate all of the endpoints, just in case, and still no success.
Other (potentially) useful information:
Any help/tips would be appreciated.
To run Fargate tasks in a private subnet without internet access, use VPC endpoints. VPC endpoints allow you to run Fargate tasks without granting the tasks access to the internet. The required endpoints are accessed over a private IP address.
If you configure your VPC with an internet gateway or an outbound-only internet gateway, Amazon ECS tasks on Fargate that are assigned an IPv6 address can access the internet. NAT gateways aren't needed.
Considerations for Amazon ECS VPC endpointsTasks using the Fargate launch type don't require the interface VPC endpoints for Amazon ECS, but you might need interface VPC endpoints for Amazon ECR, Secrets Manager, or Amazon CloudWatch Logs described in the following points.
AWS Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters of Amazon EC2 instances.
Based on the discussion in comments, the cause for the issue was determined to be incorrect CIDR range on the security groups (SGs) for the SSM VPC service endpoint.
General troubleshooting recommendation for the issue are:
enableDnsHostnames
and DNSSupport
are enabled for the VPCcreate an instance in the same subnet as the ECS service. Use the instance (after setting up its role with permissions to SSM) to check the SSM interface connectivity. The aim of this is to verify whether the issue is at VPC level or at ECS level.
in the instance, AWS CLI can be used to connect to the SSM endpoint using custom interface URL or the general one for the SSM.
Auto-assign public IP is disabled when create Fargate Task make this error too. So you need to enable Auto-assign public IP it to make thing work.
If you're running a task using the Fargate launch type in a public subnet, then choose ENABLED for Auto-assign public IP when you launch the task. This allows your task to have outbound network access to pull an image. Source
Don't know the detail, but hope it could help who come here from search engine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With