Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serve index file instead of download prompt

I have my website hosted on S3 with CloudFront as a CDN, and I need these two URLs to behave the same and to serve the index.html file within the directory:

example.com/directory example.com/directory/

The one with the / at the end incorrectly prompts the browser to download a zero byte file with a random hash for the name of the file. Without the slash it returns my 404 page.

How can I get both paths to deliver the index.html file within the directory?

If there's a way I'm "supposed" to do this, great! That's what I'm hoping for, but if not I'll probably try to use Lambda@Edge to do a redirect. I need that for some other situations anyway, so some instructions on how to do a 301 or 302 redirect from Lambda@Edge would be helpful too : )

Update (as per John Hanley's Comment)

curl -i https://www.example.com/directory/

HTTP/2 200 
content-type: application/x-directory
content-length: 0
date: Sat, 12 Jan 2019 22:07:47 GMT
last-modified: Wed, 31 Jan 2018 00:44:16 GMT
etag: "[id]"
accept-ranges: bytes
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 [id].cloudfront.net (CloudFront)
x-amz-cf-id: [id]

Update

CloudFront has one behavior set, forwarding http to https and sending the requests to S3. It also has a 404 error route under the errors tab.

like image 774
Costa Michailidis Avatar asked Jan 12 '19 21:01

Costa Michailidis


3 Answers

S3 only offers automatic index documents when you've enabled and are using the web site hosting features of the bucket, by pointing to the bucket's website hosting endpoint, ${bucket}.s3-website.${region}.amazonaws.com rather than the generic REST endpoint of the bucket, ${bucket}.s3.amazonaws.com.

Web site endpoints and REST endpoints have numerous differences, including this one.

The reason you're seeing these 0-byte files for object keys ending in / is because you are creating folder objects in the bucket using the S3 console or another utility that actually creates the 0-byte objects. They aren't needed, once the folders have objects "in" them -- but they're the only way to display an empty folder in the S3 console, which displays an object named foo/ as a folder named foo, even if there are no other objects with a key prefix of foo/. It's part of the visual emulation of a folder hierarchy in the console, even though objects in S3 are never really "in" folders.

If for some reason you need to use the REST endpoint -- such as you don't want to make the bucket public -- then you need two Lambda@Edge triggers in CloudFront, to emulate this functionality fairly closely.

An Origin Request trigger can inspect and modify requests after the CloudFront cache is checked, before the request is sent to the origin. We use this to check for a path ending in / and append index.html if we find that.

An Origin Response trigger can inspect and potentially modify responses, before they are written into the CloudFront cache. The Origin Response trigger can also inspect the original request that preceded the request that generated the response. We use this to check whether the response is an error. If it is, and the original request does not appear to be for an index document or a file (specifically, after the final slash in the path, a "file" has at least one character, followed by a dot, followed by at least one more character -- and if so, that's probably a "file"). If it's neither one of those things, we redirect to the original path plus a final / that we append.

Origin Request and Origin Response triggers fire only on cache misses. When there is a cache hit, neither trigger fires, because they are on the origin side of CloudFront -- the back side of the cache. Requests that can be served from the cache are served from the cache, so the triggers are not invoked.

The following is a Lambda@Edge function written in Node.js 8.10. This one Lambda function modifies its behavior so that it it behaves as either origin request or origin response, depending on context. After publishing a version in Lambda, associate that version's ARN with the CloudFront Cache Behavior settings as both an Origin Request and an Origin Response trigger.

'use strict';

// combination origin-request, origin-response trigger to emulate the S3
// website hosting index document functionality, while using the REST
// endpoint for the bucket

// https://stackoverflow.com/a/54263794/1695906

const INDEX_DOCUMENT = 'index.html'; // do not prepend a slash to this value

const HTTP_REDIRECT_CODE = '302'; // or use 301 or another code if desired
const HTTP_REDIRECT_MESSAGE = 'Found'; 

exports.handler = (event, context, callback) => {
    const cf = event.Records[0].cf;

    if(cf.config.eventType === 'origin-request')
    {
        // if path ends with '/' then append INDEX_DOCUMENT before sending to S3
        if(cf.request.uri.endsWith('/'))
        {
            cf.request.uri = cf.request.uri + INDEX_DOCUMENT;
        }
        // return control to CloudFront, to send request to S3, whether or not
        // we modified it; if we did, the modified URI will be requested.
        return callback(null, cf.request);
    }
    else if(cf.config.eventType === 'origin-response')
    {
        // is the response 403 or 404?  If not, we will return it unchanged.
        if(cf.response.status.match(/^40[34]$/))
        {
            // it's an error.

            // we're handling a response, but Lambda@Edge can still see the attributes of the request that generated this response; so, we
            // check whether this is a page that should be redirected with a trailing slash appended.  If it doesn't look like an index
            // document request, already, and it doesn't end in a slash, and doesn't look like a filename with an extension... we'll try that.

            // This is essentially what the S3 web site endpoint does if you hit a nonexistent key, so that the browser requests
            // the index with the correct relative path, except that S3 checks whether it will actually work.  We are using heuristics,
            // rather than checking the bucket, but checking is an alternative.

            if(!cf.request.uri.endsWith('/' + INDEX_DOCUMENT) && // not a failed request for an index document
               !cf.request.uri.endsWith('/') && // unlikely, unless this code is modified to pass other things through on the request side
               !cf.request.uri.match(/[^\/]+\.[^\/]+$/)) // doesn't look like a filename  with an extension
            {
                // add the original error to the response headers, for reference/troubleshooting
                cf.response.headers['x-redirect-reason'] = [{ key: 'X-Redirect-Reason', value: cf.response.status + ' ' + cf.response.statusDescription }];
                // set the redirect code
                cf.response.status = HTTP_REDIRECT_CODE;
                cf.response.statusDescription = HTTP_REDIRECT_MESSAGE;
                // set the Location header with the modified URI
                // just append the '/', not the "index.html" -- the next request will trigger
                // this function again, and it will be added without appearing in the
                // browser's address bar.
                cf.response.headers['location'] = [{ key: 'Location', value: cf.request.uri + '/' }];
                // not strictly necessary, since browsers don't display it, but remove the response body with the S3 error XML in it
                cf.response.body = '';
            }
        }

        // return control to CloudFront, with either the original response, or
        // the modified response, if we modified it.

        return callback(null, cf.response);

    }
    else // this is not intended as a viewer-side trigger.  Throw an exception, visible only in the Lambda CloudWatch logs and a 502 to the browser.
    {
        return callback(`Lambda function is incorrectly configured; triggered on '${cf.config.eventType}' but expected 'origin-request' or 'origin-response'`);
    }

};
like image 79
Michael - sqlbot Avatar answered Nov 10 '22 14:11

Michael - sqlbot


The answers given are wrong. Cloudfront has its own configuration to have www.yourdomain.com/ serve up a document. It's called "default root object" and its config is found under the "general" tab of your cloudfront distribution. Here are the full steps for getting an SSL/https-enabled custom domain + cloudfront + s3 bucket.

  1. Create a brand new S3 bucket with default (closed-off) permissions or remove all public access from the target bucket.
  2. Disable static website hosting. You don't need it.
  3. If you haven't already, get your SSL cert into Amazon so you can attach it to the cloudfront distribution which will be pointing to your S3 bucket.
  4. Create a cloudfront distribution pointing to the target S3 bucket, utilizing the cert.
  5. For the origin configuration, use the www.yourdomain.com.s3.amazonaws.com form for the origin, NOT the static website hosting URL (which should be disabled anyway).
  6. Let the cloudfront config automatically change the S3 bucket access ("restrict bucket access"). You want access to the bucket restricted to this cloudfront distribution ONLY (via a specific identity). No one should be hitting your S3 bucket directly, especially since it can serve via http (no "s").
  7. Under the cloudfront "general" tab (or during setup) set your default root object to "index.html" or whatever. Otherwise, requests to https://www.yourdomain.com/ will show permission denied.
like image 22
fivedogit Avatar answered Nov 10 '22 14:11

fivedogit


Recently AWS has recently launched CloudFront Functions which can be used for this use case. CloudFront Functions are cheaper, faster and easier to implement and test compared to Lambda@Edge.

Below is a sample function to attach index.html to the request if it is not provided while accessing the path.

function handler(event) {
    var request = event.request;
    var uri = request.uri;
    
    // Check whether the URI is missing a file name.
    if (uri.endsWith('/')) {
        request.uri += 'index.html';
    } 
    // Check whether the URI is missing a file extension.
    else if (!uri.includes('.')) {
        request.uri += '/index.html';
    }

    return request;
}

This will not append index.html in the web browser address bar, which gives a cleaner URL while browsing. In your case https://www.example.com/directory/ will remain as such while browsing, but will render the content of https://www.example.com/directory/index.html.

More samples can be found in https://github.com/aws-samples/amazon-cloudfront-functions/blob/main/url-rewrite-single-page-apps/index.js

like image 3
jeninjames Avatar answered Nov 10 '22 12:11

jeninjames