According to the AWS documentation on NAT Gateways, they cannot send traffic over VPC endpoints, unless it is setup in the following manner:
A NAT gateway cannot send traffic over VPC endpoints [...]. If your instances in the private subnet must access resources over a VPC endpoint [...], use the private subnet’s route table to route the traffic directly to these devices.
Following this example in the docs, I created the following configuration for my ECS app:
vpc-app
) with CIDR 172.31.0.0/16.subnet-app
) with the following route table: Destination | Target
----------------|-----------
172.31.0.0/16 | local
0.0.0.0/0 | nat-main
nat-main
) in vpc-app
in subnet default-1
with the following Route Table: Destination | Target
----------------|--------------
172.31.0.0/16 | local
0.0.0.0/0 | igw-xxxxxxxx
sg-app
) with port 443 open for subnet-app
.vpc-app
, subnet-app
and sg-app
for the following services: com.amazonaws.eu-west-1.ecr.api
com.amazonaws.eu-west-1.ecr.dkr
com.amazonaws.eu-west-1.ecs
com.amazonaws.eu-west-1.ecs-agent
com.amazonaws.eu-west-1.ecs-telemetry
com.amazonaws.eu-west-1.s3 (Gateway)
It's also important to mention that I've enabled DNS Resolution and DNS Hostnames for vpc-app
, as well as the Enable Private DNS Name option for the ecr-dkr
and ecr-api
VPC endpoints.
I've also tried working only with Fargate containers since they don't have the added complication of the ECS Agent, and because according to the docs:
Tasks using the Fargate launch type only require the com.amazonaws.region.ecr.dkr Amazon ECR VPC endpoint and the Amazon S3 gateway endpoint to take advantage of this feature.
This also doesn't work and every time my Fargate tasks run I see a spike in Bytes out to source under nat-main
's Monitoring.
No matter what I try, the EC2 instances (and Fargate tasks) in the subnet-app
are still pulling images using nat-main
and not going to the local address of the ECR service.
I've restarted the ECS Agent and made sure to check all the boxes in the ECS Interface VPC Endpoints guide AND the ECR Interface Endpoints guide.
What am I missing here?
Any help would be appreciated.
After many hours of trial and error, and with lots of help from @jogold, the missing piece was found in this blog post:
The next step is to create a gateway VPC endpoint for S3. This is necessary because ECR uses S3 to store Docker image layers. When your instances download Docker images from ECR, they must access ECR to get the image manifest and S3 to download the actual image layers.
After I created the S3 Gateway VPCE, I forgot to add its address to subnet-app
's routing table, so although the initial request to my ECR URI was made using the internal address, the downloading of the image from S3 still used the NAT Gateway.
After adding the entry, the network usage of the NAT Gateway dropped dramatically.
More information on how to setup Gateway VPCE can be found here.
Interface VPC endpoints work with DNS resolution, not routing.
In order for you configuration to work, you need to ensure that you checked Enable Private DNS Name when you created the endpoint. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames.
From the documentation:
When you create an interface endpoint, we generate endpoint-specific DNS hostnames that you can use to communicate with the service. For AWS services and AWS Marketplace partner services, you can optionally enable private DNS for the endpoint. This option associates a private hosted zone with your VPC. The hosted zone contains a record set for the default DNS name for the service (for example, ec2.us-east-1.amazonaws.com) that resolves to the private IP addresses of the endpoint network interfaces in your VPC. This enables you to make requests to the service using its default DNS hostname instead of the endpoint-specific DNS hostnames. For example, if your existing applications make requests to an AWS service, they can continue to make requests through the interface endpoint without requiring any configuration changes.
The alternative is to update your application to use your endpoint-specific DNS hostnames.
Note that to use private DNS names, DNS resolution and DNS hostnames must be enabled for your VPC:
Also note that in order to use ECR/ECS without a NAT gateway, you need to configure a S3 endpoint (gateway, requires route table update) to allow instances to download the image layers from the underlying private Amazon S3 buckets that host them. More information in Setting up AWS PrivateLink for Amazon ECS, and Amazon ECR
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With