Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloudfront URL rewriting/remapping so content has two URLs?

I'm managing a documentation website that has URLs in this pattern:

/product-foo/1.2.3/user-guide/system-requirements.html

I want to have two URLs to the page:

/product-foo/1.2.3/user-guide/system-requirements.html  
/product-foo/latest/user-guide/system-requirements.html

as can be done with an Apache Web server as documented at
http://httpd.apache.org/docs/2.4/rewrite/remapping.html

"Assume we have recently renamed the page foo.html to bar.html and now want to provide the old URL for backward compatibility. However, we want that users of the old URL even not recognize that the pages was renamed - that is, we don't want the address to change in their browser.
Solution: We rewrite the old URL to the new one internally via the following rule:
RewriteEngine on
RewriteRule "^/foo\.html$" "/bar.html" [PT]"

The idea is that with each new product version, I'll update the redirects to point the "latest" pattern to the documentation for the most recently released version. This is so people can link to the latest documentation if they want or they can link to a version-specific version if they want.

Can this be done with Cloudfront configuration? Can it be done with s3 alone without Cloudfront? Can it be done with AWS Lambda or Lambda@Edge? (Will the solution be subject to the Lambda@Edge bandwith limits?) Can you provide a specific example solution?

like image 1000
rcrews Avatar asked Mar 04 '18 22:03

rcrews


1 Answers

This can be done using a Lambda@Edge trigger. The generated response size limits of Lambda@Edge do not apply unless the Lambda function, itself, is actually generating the response by populating the body attribute of the response object with content it has created or obtained somewhere else, thus generating the response within the function.

With an Origin Request trigger:

  • the trigger fires only after the cache is checked, and only when there not a cache hit (with a cache hit, the origin isn't contacted, so invoking the trigger isn't needed)
  • the trigger fires before the request is sent to the origin
  • you can modify the path that will be sent in the request to the origin
  • the response is cached under the path originally requested by the browser, not the modified path
  • the browser is not redirected, so the address bar does not change.¹

Fundamentally, all we need to do in the Lambda function is extract the request object, modify the URI² and tell CloudFront to continue processing the request, as modified. We're just rewriting part of the request in flight, and returning control to CloudFront.

The example, below, is almost certainly not the most optimal or tidy way to handle a series of possible string manipulations, but is adequate to illustrate the general idea of what your code needs to accomplish, by whatever mapping and matching mechanism you want to use.

You could statically remap the values, or you could use any number of database strategies to look up the original path and find the correct, current destination to use.

'use strict';

exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    request.uri = request.uri
        .replace(/^\/product-foo\/latest\//,'/product-foo/1.0.0/')
        .replace(/^\/product-bar\/latest\//,'/product-bar/3.2.1/')
        .replace(/^\/product-three\/latest\//,'/product-three/5.5.5/');

    return callback(null, request);
};

event.Records is always an array of exactly one member, and event.Records[0].cf contains all of the relevant information for this particular invocation. event.Records[0].cf.request is the original request. Modifying this object and supplying it as the second argument to the callback directs CloudFront to continue normal processing, using the modified request.

The first argument to the callback is always null, indicating that no exception occurred. If an exception is thrown, or the first argument otherwise isn't null, then CloudFront returns a generic error to the viewer... it does not display the exception, since that could contain a stack trace or other sensitive informarion that should not be exposed. The error is accessible in the Lambda logs.


¹unless, of course, the origin server actually responds with a redirect.

²what Lambda@Edge calls the "URI" is actually only the path. The complete URI is technically path + '?' + query string, but Lambda@Edge separates these two things.

like image 194
Michael - sqlbot Avatar answered Dec 31 '22 20:12

Michael - sqlbot