I'm trying to restrict access to a S3 bucket and only allowing certain domains from a list based on the referer. The bucket policy is basically: <pre class="prettyprint"><code>{ "Version": "2012-10-17", "Id": "http referer domain lock", "Statement": [ { "Sid": "Allow get requests originating from specific domains", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::example.com/*", "Condition": { "StringLike": { "aws:Referer": [ "*othersite1.com/*", "*othersite2.com/*", "*othersite3.com/*" ] } } } ] } </code></pre> This othersite1,2 and 3 call an object that i have stored in my s3 bucket under the domain example.com. I also have a cloudfront distribution attached to the bucket. I'm using * wildcard before and after the string condition. The referer can be othersite1.com/folder/another-folder/page.html. The referer may also use http or https. I don't know why I'm getting 403 Forbidden error. I'm doing this basically because i don't want other sites to call that object. Any help would be greatly appreciated.

As is necessary for correct caching behavior, CloudFront strips almost all of the request headers off of a request before forwarding it to the origin server. <blockquote> <code>Referer</code> | CloudFront removes the header. http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior </blockquote> So, if your bucket is trying to block requests based on the referring page, as is sometimes done to prevent hotlinking, S3 will not -- by default -- be able to see the <code>Referer</code> header, because CloudFront doesn't forward it. And, this is a very good illustration of why CloudFront doesn't forward it. If CloudFront forwarded the header and then blindly cached the result, whether the bucket policy had the intended effect would depend on whether the first request was from one of the intended sites, or from elsewhere -- and other requesters would get the cached response, which might be the wrong response. (tl;dr) Whitelisting the <code>Referer</code> header for forwarding to the origin (in the CloudFront Cache Behavior settings) solves this issue. But, there is a bit of a catch. Now that you are forwarding the <code>Referer</code> header to S3, you've extended the cache key -- the list of things against which CloudFront caches responses -- to include the <code>Referer</code> header. So, now, for each object, CloudFront will not serve a response from cache unless the incoming request's <code>Referer</code> header matches exactly one from an already-cached request... otherwise the request has to go to S3. And, the thing about the referer header, it's the referring page, not the referring site, so each page from the authorized sites will have its own cached copy of these assets in CloudFront. This, itself, is not a problem. There is no charge for these extra copies of objects, and this is how CloudFront is designed to work... the problem is, it reduces the likelihood of a given object being in a given edge cache, since each object will necessarily be referenced less. This becomes less significant -- to the point of insignificance -- if you have a large amount of traffic, and more significant if your traffic is smaller. Fewer cache hits means slower page loads and more requests going to S3. There is not a correct answer to whether or not this is ideal for you, because it is very specific to exactly how you are using CloudFront and S3. But, here's the alternative: You can remove the <code>Referer</code> header from the whitelist of headers to forward to S3 and undo that potential for negatively impacting cache hits, by configuring CloudFront to fire a Lambda@Edge Viewer Request trigger that will inspect each request as it comes in the front door, and block those requests that don't come from referring pages that you want to allow. A Viewer Request trigger fires after the specific Cache Behavior is matched, but before the actual cache is checked, and with most of the incoming headers still intact. You can allow the request to proceed, optionally with modifications, or you can generate a response and cancel the rest of the CloudFront processing. That's what I'm illustrating, below -- if the host part of the <code>Referer</code> header isn't in the array of acceptable values, we generate a 403 response; otherwise, the request continues, the cache is checked, and the origin consulted only as needed. Firing this trigger adds a small amount of overhead to every request, but that overhead may amortize out to being more desirable than a reduced cache hit rate. So, the following is not a "better" solution -- just an alternate solution. This is a Lambda function written in Node.js 6.10. <pre class="prettyprint lang-js prettyprint-override"><code>'use strict'; const allow_empty_referer = true; const allowed_referers = ['example.com', 'example.net']; exports.handler = (event, context, callback) => { // extract the original request, and the headers from the request const request = event.Records[0].cf.request; const headers = request.headers; // find the first referer header if present, and extract its value; // then take http[s]://<--this-part-->/only/not/the/path. // the || [])[0]) || {'value' : ''} construct is optimizing away some if(){ if(){ if(){ } } } validation const referer_host = (((headers.referer || [])[0]) || {'value' : ''})['value'].split('/')[2]; // compare to the list, and immediately allow the request to proceed through CloudFront // if we find a match for(var i = allowed_referers.length; i--;) { if(referer_host == allowed_referers[i]) { return callback(null,request); } } // also test for no referer header value if we allowed that, above // usually, you do want to allow this if(allow_empty_referer && referer_host === "") { return callback(null,request); } // we did not find a reason to allow the request, so we deny it. const response = { status: '403', statusDescription: 'Forbidden', headers: { 'vary': [{ key: 'Vary', value: '*' }], // hint, but not too obvious 'cache-control': [{ key: 'Cache-Control', value: 'max-age=60' }], // browser-caching timer 'content-type': [{ key: 'Content-Type', value: 'text/plain' }], // can't return binary (yet?) }, body: 'Access Denied\n', }; callback(null, response); }; </code></pre>

Restricting access to AWS S3 bucket based on referer

Tags:

amazon-web-services

amazon-s3

amazon-cloudfront

policy

I'm trying to restrict access to a S3 bucket and only allowing certain domains from a list based on the referer.

The bucket policy is basically:

{
"Version": "2012-10-17",
"Id": "http referer domain lock",
"Statement": [
    {
        "Sid": "Allow get requests originating from specific domains",
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::example.com/*",
        "Condition": {
            "StringLike": {
                "aws:Referer":  [ 
                    "*othersite1.com/*",
                    "*othersite2.com/*",
                    "*othersite3.com/*"
                ]
            }
        }
    }
 ]
}

This othersite1,2 and 3 call an object that i have stored in my s3 bucket under the domain example.com. I also have a cloudfront distribution attached to the bucket. I'm using * wildcard before and after the string condition. The referer can be othersite1.com/folder/another-folder/page.html. The referer may also use http or https.

I don't know why I'm getting 403 Forbidden error.

I'm doing this basically because i don't want other sites to call that object.

Any help would be greatly appreciated.

762

asked Sep 02 '17 05:09

esdrayker

1 Answers

As is necessary for correct caching behavior, CloudFront strips almost all of the request headers off of a request before forwarding it to the origin server.

Referer | CloudFront removes the header.

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior

So, if your bucket is trying to block requests based on the referring page, as is sometimes done to prevent hotlinking, S3 will not -- by default -- be able to see the Referer header, because CloudFront doesn't forward it.

And, this is a very good illustration of why CloudFront doesn't forward it. If CloudFront forwarded the header and then blindly cached the result, whether the bucket policy had the intended effect would depend on whether the first request was from one of the intended sites, or from elsewhere -- and other requesters would get the cached response, which might be the wrong response.

(tl;dr) Whitelisting the Referer header for forwarding to the origin (in the CloudFront Cache Behavior settings) solves this issue.

But, there is a bit of a catch.

Now that you are forwarding the Referer header to S3, you've extended the cache key -- the list of things against which CloudFront caches responses -- to include the Referer header.

So, now, for each object, CloudFront will not serve a response from cache unless the incoming request's Referer header matches exactly one from an already-cached request... otherwise the request has to go to S3. And, the thing about the referer header, it's the referring page, not the referring site, so each page from the authorized sites will have its own cached copy of these assets in CloudFront.

This, itself, is not a problem. There is no charge for these extra copies of objects, and this is how CloudFront is designed to work... the problem is, it reduces the likelihood of a given object being in a given edge cache, since each object will necessarily be referenced less. This becomes less significant -- to the point of insignificance -- if you have a large amount of traffic, and more significant if your traffic is smaller. Fewer cache hits means slower page loads and more requests going to S3.

There is not a correct answer to whether or not this is ideal for you, because it is very specific to exactly how you are using CloudFront and S3.

But, here's the alternative:

You can remove the Referer header from the whitelist of headers to forward to S3 and undo that potential for negatively impacting cache hits, by configuring CloudFront to fire a Lambda@Edge Viewer Request trigger that will inspect each request as it comes in the front door, and block those requests that don't come from referring pages that you want to allow.

A Viewer Request trigger fires after the specific Cache Behavior is matched, but before the actual cache is checked, and with most of the incoming headers still intact. You can allow the request to proceed, optionally with modifications, or you can generate a response and cancel the rest of the CloudFront processing. That's what I'm illustrating, below -- if the host part of the Referer header isn't in the array of acceptable values, we generate a 403 response; otherwise, the request continues, the cache is checked, and the origin consulted only as needed.

Firing this trigger adds a small amount of overhead to every request, but that overhead may amortize out to being more desirable than a reduced cache hit rate. So, the following is not a "better" solution -- just an alternate solution.

This is a Lambda function written in Node.js 6.10.

'use strict';

const allow_empty_referer = true;

const allowed_referers = ['example.com', 'example.net'];

exports.handler = (event, context, callback) => {

    // extract the original request, and the headers from the request
    const request = event.Records[0].cf.request;
    const headers = request.headers;

    // find the first referer header if present, and extract its value;
    // then take http[s]://<--this-part-->/only/not/the/path.
    // the || [])[0]) || {'value' : ''} construct is optimizing away some if(){ if(){ if(){ } } } validation

    const referer_host = (((headers.referer || [])[0]) || {'value' : ''})['value'].split('/')[2];

    // compare to the list, and immediately allow the request to proceed through CloudFront 
    // if we find a match

    for(var i = allowed_referers.length; i--;)
    {
        if(referer_host == allowed_referers[i])
        {
            return callback(null,request);
        }
    }

    // also test for no referer header value if we allowed that, above
    // usually, you do want to allow this

    if(allow_empty_referer && referer_host === "")
    {
        return callback(null,request);
    }

    // we did not find a reason to allow the request, so we deny it.

    const response = {
        status: '403',
        statusDescription: 'Forbidden',
        headers: {
            'vary':          [{ key: 'Vary',          value: '*' }], // hint, but not too obvious
            'cache-control': [{ key: 'Cache-Control', value: 'max-age=60' }], // browser-caching timer
            'content-type':  [{ key: 'Content-Type',  value: 'text/plain' }], // can't return binary (yet?)
        },
        body: 'Access Denied\n',
    };

    callback(null, response);
};

128

answered Sep 26 '22 20:09

Michael - sqlbot

Related questions
                            
                                Is possible to connect to RDS Oracle DB instance with SQL developer?
                            
                                AWS S3 gracefully handle 403 after getSignedUrl expired
                            
                                Restrict services/apps to run in a particular Amazon ECS instance
                            
                                Amazon SQS: The same message is consumed by two current consumers
                            
                                How to (properly) use external credentials in an AWS Lambda function?
                            
                                How can I work around "Too many redirects error" when trying to direct all web traffic to https with ec2 apache2 ubuntu?
                            
                                Accessing Error log in shiny-server deployed on AWS instance
                            
                                ServerSelectionTimeoutError when connecting to aws with pymongo
                            
                                How to use Application Load Balancer for an ECS Service with multiple port mappings?
                            
                                AWS Security Group for RDS - Outbound rules
                            
                                s3api get-bucket-location returns null for us-east-1 bucket
                            
                                How to see real IPs of SSH client of SSH servers running behind AWS ELB
                            
                                AWS CodeBuild environment variables for versioning?
                            
                                Find EC2 Instances belonging to specific Target Group with Boto3
                            
                                Passing array query parameters with API Gateway to lambda
                            
                                AWS cli: how to start all machines found by tag
                            
                                AWS SES SDK send email with attachments
                            
                                Issue with node.js app stopping on AWS
                            
                                Setting up a NAT gateway with VPC using Serverless framework
                            
                                How to get host for RDS instance with boto3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With