Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ECS Fargate 1.4 Not Using VPC Endpoints

This is an odd one. I have an ECS service using Fargate v1.4 in a private subnet. Since the tasks don't have access to the Internet, I had to configure VPC Endpoints so that tasks could load what they needed from AWS services (e.g. secrets from SSM, the image from ECR, etc.). This was all and good and worked just fine, until it didn't. I'm not sure what changed, but one weekend I noticed my servers weren't running anymore and I noticed this error in the console:

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secrets from ssm: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://ssm.us-ea...

That looked familiar from when I was configuring the VPC endpoints, so I went through the console to make sure nothing changed. As far as I can tell, the configuration looks right (security groups have the proper ingress/egress rules, proper endpoints are configured and connected to the VPC my servers are in, everything is in the same AZ, IAM roles have access to the secret).

As an experiment, I removed the secrets I was trying to load from the task definition to see what would happen. When a new server spun up, I saw a similar error, but this time for loading the image from ECR:

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): RequestError: send request failed caused by: Post https://api.ecr....

I also tried to delete and recreate all of the endpoints, just in case, and still no success.

Other (potentially) useful information:

  • Region: us-east-1
  • I'm using the latest version of Pulumi
  • I'm using app autoscaling to spin down the instances during the week

Any help/tips would be appreciated.

like image 631
c1moore Avatar asked May 31 '20 01:05

c1moore


People also ask

Does fargate need a VPC endpoint?

To run Fargate tasks in a private subnet without internet access, use VPC endpoints. VPC endpoints allow you to run Fargate tasks without granting the tasks access to the internet. The required endpoints are accessed over a private IP address.

Do I need NAT gateway with fargate?

If you configure your VPC with an internet gateway or an outbound-only internet gateway, Amazon ECS tasks on Fargate that are assigned an IPv6 address can access the internet. NAT gateways aren't needed.

Does ECS require VPC?

Considerations for Amazon ECS VPC endpointsTasks using the Fargate launch type don't require the interface VPC endpoints for Amazon ECS, but you might need interface VPC endpoints for Amazon ECR, Secrets Manager, or Amazon CloudWatch Logs described in the following points.

What instance type does fargate use?

AWS Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters of Amazon EC2 instances.


2 Answers

Based on the discussion in comments, the cause for the issue was determined to be incorrect CIDR range on the security groups (SGs) for the SSM VPC service endpoint.

General troubleshooting recommendation for the issue are:

  • check the ingress rules on the SGs for the VPC interface endpoint (port 443 open).
  • ensure that S3 gateway endpoint is also available and working as it is required by SSM.
  • check if enableDnsHostnames and DNSSupport are enabled for the VPC
  • create an instance in the same subnet as the ECS service. Use the instance (after setting up its role with permissions to SSM) to check the SSM interface connectivity. The aim of this is to verify whether the issue is at VPC level or at ECS level.

  • in the instance, AWS CLI can be used to connect to the SSM endpoint using custom interface URL or the general one for the SSM.

like image 101
Marcin Avatar answered Oct 30 '22 22:10

Marcin


Auto-assign public IP is disabled when create Fargate Task make this error too. So you need to enable Auto-assign public IP it to make thing work.

If you're running a task using the Fargate launch type in a public subnet, then choose ENABLED for Auto-assign public IP when you launch the task. This allows your task to have outbound network access to pull an image. Source

Don't know the detail, but hope it could help who come here from search engine.

like image 45
KhoaHV Avatar answered Oct 30 '22 22:10

KhoaHV