Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most performant way to serve index.html for a single-page-application among all aws services?

Trying to find the most performant way to serve the index.html file for a single page application in AWS. Main requirements are:

  1. The AWS service must be able to serve the file from a wildcard domain such as *.domain.com.
  2. The SPA would rather not use hash-based routing, meaning that https://foo.domain.com/path/to/resource is preferred over a URL like https://foo.domain.com/#/path/to/resource.

Serving the file straight from a lambda-backed API Gateway seems infeasible because that approach doesn't satisfy the custom wildcard domain requirement.

We've tried "unsuccessfully" to use cloudfront backed by an S3 origin. To use a SPA with cloudfront and HTML5 (non-hash-based) path routing, you must specify CustomErrorResponses to serve the index.html file for http status codes 404 and 403. While this works to serve the index.html file correctly, responses always end up with the x-cache: Error from cloudfront header. This means cloudfront took time to look for the HTML5 path in the S3 origin before serving index.html as the default error document. Combining this with the fact that the cloudfront uses an origin-response Lambda@Edge function to add custom http headers adds latency to these non-cached responses.

In some regions of the US, we're seeing requests for this file take 500-1000 milliseconds. For example with a cloudfront distribution hosted in Virginia and a viewer in the central US, a request seems to route from the viewer to the nearest edge location (sometimes farther west), then traversing to and from Virginia (where the S3 origin is hosted), then finally back from the edge location to the viewer.

We've also tried unsuccessfully to use the Lambda@Edge to cache the error response body along with the headers.

What we haven't tried yet are:

  1. Application Load Balancer pointed at a lambda function (either with or without an API Gateway)
  2. Application Load Balancer pointed directly at an EC2 instance.

Before we decide to try out these more expensive hosting options, asking the community if there is a way to make cloudfront more performant given our requirements. If not, I expect EC2 has potential to be more performant than ALB/lambda, since EC2 shouldn't suffer cold starts? Is that an accurate assumption?

like image 522
danludwig Avatar asked Oct 16 '22 10:10

danludwig


1 Answers

The solution to this is for the cloudfront distribution to define other cache behaviors in addition to the DefaultCacheBehavior (*).

Set up the default cache behavior with an origin-request lambda@edge association. When the origin-request association is invoked, it should return a response including the index.html file contents along with any required headers. This response will be cached in the distribution for any requested virtual paths. There are two ways for the lambda@edge function to obtain these contents:

  1. From within the function code, invoke an http(s) get for the index.html file at the cloudfront URL (such as dklyksfhsksdgjh.cloudfront.net/index.html). The distribution will return the file based on a different, non-default cache behavior that you will also set up. This approach delivers less than optimal performance the first time any virtual html5 path is requested, though subsequent requests will serve the content from the cloudfront distribution cache.

  2. Embed the contents of the index.html file into the lambda@edge function code during the build process for the function. This approach delivers better performance than option #1 as no network request to obtain the file contents is necessary.

Additionally set up another cache behavior for the path pattern /index.html with an origin-response lambda@edge association. When the origin-response association is invoked, it should add any required headers to the response.

If the distribution contains other files (such as /robots.txt, /favicon.ico, /fonts, /scripts, /styles, etc), set up additional cache behaviors that match those paths. This is required so that requests for those files do not return the index.html file during the default cache behavior's origin-request lambda@edge association.

With this approach, requests for the root of the application (i.e. www.site.com or www.site.com/index.html) will match the /index.html cache behavior, obtain the file from its S3 origin, add any required headers via the origin-response lambda@edge association, and cache the file. The first request should contain an x-cache: Miss from cloudfront response header, but subsequent requests should return x-cache: Hit from cloudfront until the cache TTL expires.

Requests for other files (such as /robots.txt, /scripts/myscript.js, etc) will match other cache behaviors that you define for other file paths in the distribution.

Requests for virtual HTML5 paths (i.e. www.site.com/path/handled/by/javascript) will match the default cache behavior and, because of the origin-request lambda@edge association, return the index.html without checking for any files in the S3 origin. You will still need to add any required headers like the origin-response lambda@edge association did for the /index.html cache behavior. The requests will be cached, though each virtual HTML5 path will be cached separately. For example a request for /foo and a request for /bar will both invoke the origin-request lambda@edge association before being cached.

like image 51
danludwig Avatar answered Oct 27 '22 10:10

danludwig