Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restricting access to AWS S3 bucket based on referer

I'm trying to restrict access to a S3 bucket and only allowing certain domains from a list based on the referer.

The bucket policy is basically:

{
"Version": "2012-10-17",
"Id": "http referer domain lock",
"Statement": [
    {
        "Sid": "Allow get requests originating from specific domains",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::example.com/*",
        "Condition": {
            "StringLike": {
                "aws:Referer":  [ 
                    "*othersite1.com/*",
                    "*othersite2.com/*",
                    "*othersite3.com/*"
                ]
            }
        }
    }
 ]
}

This othersite1,2 and 3 call an object that i have stored in my s3 bucket under the domain example.com. I also have a cloudfront distribution attached to the bucket. I'm using * wildcard before and after the string condition. The referer can be othersite1.com/folder/another-folder/page.html. The referer may also use http or https.

I don't know why I'm getting 403 Forbidden error.

I'm doing this basically because i don't want other sites to call that object.

Any help would be greatly appreciated.

like image 762
esdrayker Avatar asked Sep 02 '17 05:09

esdrayker


People also ask

How do I restrict Amazon S3 bucket access to a specific IAM user?

You can use the NotPrincipal element of an IAM or S3 bucket policy to limit resource access to a specific set of users. This element allows you to block all users who are not defined in its value array, even if they have an Allow in their own IAM user policies.

What is the recommended approach to restrict access to S3 buckets?

Restrict access to your S3 buckets or objects by doing the following: Writing IAM user policies that specify the users that can access specific buckets and objects. IAM policies provide a programmatic way to manage Amazon S3 permissions for multiple users.

How do I control access to S3 buckets?

User policies – You can use IAM to manage access to your Amazon S3 resources. You can create IAM users, groups, and roles in your account and attach access policies to them granting them access to AWS resources, including Amazon S3. For more information about IAM, see AWS Identity and Access Management (IAM) .


1 Answers

As is necessary for correct caching behavior, CloudFront strips almost all of the request headers off of a request before forwarding it to the origin server.

Referer | CloudFront removes the header.

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior

So, if your bucket is trying to block requests based on the referring page, as is sometimes done to prevent hotlinking, S3 will not -- by default -- be able to see the Referer header, because CloudFront doesn't forward it.

And, this is a very good illustration of why CloudFront doesn't forward it. If CloudFront forwarded the header and then blindly cached the result, whether the bucket policy had the intended effect would depend on whether the first request was from one of the intended sites, or from elsewhere -- and other requesters would get the cached response, which might be the wrong response.

(tl;dr) Whitelisting the Referer header for forwarding to the origin (in the CloudFront Cache Behavior settings) solves this issue.

But, there is a bit of a catch.

Now that you are forwarding the Referer header to S3, you've extended the cache key -- the list of things against which CloudFront caches responses -- to include the Referer header.

So, now, for each object, CloudFront will not serve a response from cache unless the incoming request's Referer header matches exactly one from an already-cached request... otherwise the request has to go to S3. And, the thing about the referer header, it's the referring page, not the referring site, so each page from the authorized sites will have its own cached copy of these assets in CloudFront.

This, itself, is not a problem. There is no charge for these extra copies of objects, and this is how CloudFront is designed to work... the problem is, it reduces the likelihood of a given object being in a given edge cache, since each object will necessarily be referenced less. This becomes less significant -- to the point of insignificance -- if you have a large amount of traffic, and more significant if your traffic is smaller. Fewer cache hits means slower page loads and more requests going to S3.

There is not a correct answer to whether or not this is ideal for you, because it is very specific to exactly how you are using CloudFront and S3.

But, here's the alternative:

You can remove the Referer header from the whitelist of headers to forward to S3 and undo that potential for negatively impacting cache hits, by configuring CloudFront to fire a Lambda@Edge Viewer Request trigger that will inspect each request as it comes in the front door, and block those requests that don't come from referring pages that you want to allow.

A Viewer Request trigger fires after the specific Cache Behavior is matched, but before the actual cache is checked, and with most of the incoming headers still intact. You can allow the request to proceed, optionally with modifications, or you can generate a response and cancel the rest of the CloudFront processing. That's what I'm illustrating, below -- if the host part of the Referer header isn't in the array of acceptable values, we generate a 403 response; otherwise, the request continues, the cache is checked, and the origin consulted only as needed.

Firing this trigger adds a small amount of overhead to every request, but that overhead may amortize out to being more desirable than a reduced cache hit rate. So, the following is not a "better" solution -- just an alternate solution.

This is a Lambda function written in Node.js 6.10.

'use strict';

const allow_empty_referer = true;

const allowed_referers = ['example.com', 'example.net'];

exports.handler = (event, context, callback) => {

    // extract the original request, and the headers from the request
    const request = event.Records[0].cf.request;
    const headers = request.headers;

    // find the first referer header if present, and extract its value;
    // then take http[s]://<--this-part-->/only/not/the/path.
    // the || [])[0]) || {'value' : ''} construct is optimizing away some if(){ if(){ if(){ } } } validation

    const referer_host = (((headers.referer || [])[0]) || {'value' : ''})['value'].split('/')[2];

    // compare to the list, and immediately allow the request to proceed through CloudFront 
    // if we find a match

    for(var i = allowed_referers.length; i--;)
    {
        if(referer_host == allowed_referers[i])
        {
            return callback(null,request);
        }
    }

    // also test for no referer header value if we allowed that, above
    // usually, you do want to allow this

    if(allow_empty_referer && referer_host === "")
    {
        return callback(null,request);
    }

    // we did not find a reason to allow the request, so we deny it.

    const response = {
        status: '403',
        statusDescription: 'Forbidden',
        headers: {
            'vary':          [{ key: 'Vary',          value: '*' }], // hint, but not too obvious
            'cache-control': [{ key: 'Cache-Control', value: 'max-age=60' }], // browser-caching timer
            'content-type':  [{ key: 'Content-Type',  value: 'text/plain' }], // can't return binary (yet?)
        },
        body: 'Access Denied\n',
    };

    callback(null, response);
};
like image 128
Michael - sqlbot Avatar answered Sep 26 '22 20:09

Michael - sqlbot