Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS ECS: VPC Endpoints and NAT Gateways

According to the AWS documentation on NAT Gateways, they cannot send traffic over VPC endpoints, unless it is setup in the following manner:

A NAT gateway cannot send traffic over VPC endpoints [...]. If your instances in the private subnet must access resources over a VPC endpoint [...], use the private subnet’s route table to route the traffic directly to these devices.

Following this example in the docs, I created the following configuration for my ECS app:

  1. VPC (vpc-app) with CIDR 172.31.0.0/16.
  2. App subnet (subnet-app) with the following route table:
    Destination     |  Target
    ----------------|-----------
    172.31.0.0/16   |   local  
    0.0.0.0/0       |  nat-main
  1. NAT Gateway (nat-main) in vpc-app in subnet default-1 with the following Route Table:
    Destination     |    Target
    ----------------|--------------
    172.31.0.0/16   |     local  
    0.0.0.0/0       |  igw-xxxxxxxx
  1. Security Group (sg-app) with port 443 open for subnet-app.
  2. VPC Endpoints (Interface type) with vpc-app, subnet-app and sg-app for the following services:
    com.amazonaws.eu-west-1.ecr.api  
    com.amazonaws.eu-west-1.ecr.dkr  
    com.amazonaws.eu-west-1.ecs  
    com.amazonaws.eu-west-1.ecs-agent  
    com.amazonaws.eu-west-1.ecs-telemetry  
    com.amazonaws.eu-west-1.s3 (Gateway)

It's also important to mention that I've enabled DNS Resolution and DNS Hostnames for vpc-app, as well as the Enable Private DNS Name option for the ecr-dkr and ecr-api VPC endpoints.

I've also tried working only with Fargate containers since they don't have the added complication of the ECS Agent, and because according to the docs:

Tasks using the Fargate launch type only require the com.amazonaws.region.ecr.dkr Amazon ECR VPC endpoint and the Amazon S3 gateway endpoint to take advantage of this feature.

This also doesn't work and every time my Fargate tasks run I see a spike in Bytes out to source under nat-main's Monitoring.

No matter what I try, the EC2 instances (and Fargate tasks) in the subnet-app are still pulling images using nat-main and not going to the local address of the ECR service.

I've restarted the ECS Agent and made sure to check all the boxes in the ECS Interface VPC Endpoints guide AND the ECR Interface Endpoints guide.

What am I missing here?

Any help would be appreciated.

like image 878
kutacoder Avatar asked Jan 27 '23 09:01

kutacoder


2 Answers

After many hours of trial and error, and with lots of help from @jogold, the missing piece was found in this blog post:

The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.

After I created the S3 Gateway VPCE, I forgot to add its address to subnet-app's routing table, so although the initial request to my ECR URI was made using the internal address, the downloading of the image from S3 still used the NAT Gateway.

After adding the entry, the network usage of the NAT Gateway dropped dramatically.

More information on how to setup Gateway VPCE can be found here.

like image 146
kutacoder Avatar answered Feb 01 '23 12:02

kutacoder


Interface VPC endpoints work with DNS resolution, not routing.

In order for you configuration to work, you need to ensure that you checked Enable Private DNS Name when you created the endpoint. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames.

enter image description here

From the documentation:

When you create an interface endpoint, we generate endpoint-specific DNS hostnames that you can use to communicate with the service. For AWS services and AWS Marketplace partner services, you can optionally enable private DNS for the endpoint. This option associates a private hosted zone with your VPC. The hosted zone contains a record set for the default DNS name for the service (for example, ec2.us-east-1.amazonaws.com) that resolves to the private IP addresses of the endpoint network interfaces in your VPC. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames. For example, if your existing applications make requests to an AWS service, they can continue to make requests through the interface endpoint without requiring any configuration changes.

The alternative is to update your application to use your endpoint-specific DNS hostnames.

Note that to use private DNS names, DNS resolution and DNS hostnames must be enabled for your VPC:

enter image description here

Also note that in order to use ECR/ECS without a NAT gateway, you need to configure a S3 endpoint (gateway, requires route table update) to allow instances to download the image layers from the underlying private Amazon S3 buckets that host them. More information in Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR

like image 32
jogold Avatar answered Feb 01 '23 12:02

jogold