Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Cloudfront: private content but maximise local browser caching

For JPEG image delivery in my web app, I am considering using Amazon S3 (or Amazon Cloudfront if it turns out to be the better option) but have two, possibly opposing, requirements:

  1. The images are private content; I want to use signed URLs with short expiration times.
  2. The images are large; I want them cached long-term by the users' browser.

The approach I'm thinking is:

  1. User requests www.myserver.com/the_image
  2. Logic on my server determines the user is allowed to view the image. If they are allowed...
  3. Redirect the browser (is HTTP 307 best ?) to a signed Cloudfront URL
  4. Signed Cloudfront URL expires in 60 seconds but its response includes "Cache-Control max-age=31536000, private"

The problem I forsee is that the next time the page loads, the browser will be looking for www.myserver.com/the_image but its cache will be for the signed Cloudfront URL. My server will return a different signed Cloudfront URL the second time, due to very short expiration times, so the browser won't know it can use its cache.

Is there a way round this without having my webserver proxy the image from Cloudfront (which obviously negates all the benefits of using Cloudfront)?

Wondering if there may be something I could do with etag and HTTP 304 but can't quite join the dots...

like image 357
Mike Avatar asked Jan 27 '13 22:01

Mike


People also ask

How do you make CloudFront not cache objects?

On your custom origin web server application, add Cache-Control no-cache, no-store, or private directives to the objects that you don't want CloudFront to cache. Or, add Expires directives to the objects that you don't want CloudFront to cache.

Can you make CloudFront private?

You can control user access to your private content in two ways: Restrict access to files in CloudFront caches. Restrict access to files in your origin by doing one of the following: Set up an origin access control (OAC) for your Amazon S3 bucket.

Does CloudFront support caching?

CloudFront caches your objects based on the values in all of the specified headers. CloudFront also forwards the headers that it forwards by default, but it caches your objects based only on the headers that you specify.

How do I increase CloudFront cache hit ratio?

To increase your cache hit ratio, you can configure your origin to add a Cache-Control max-age directive to your objects, and specify the longest practical value for max-age .


1 Answers

To summarize, you have private images you'd like to serve through Amazon Cloudfront via signed urls with a very short expiration. However, while access by a particular url may be time limited, it is desirable that the client serve the image from cache on subsequent requests even after the url expiration.

Regardless of how the client arrives at the cloudfront url (directly or via some server redirect), the client cache of the image will only be associated with the particular url that was used to request the image (and not any other url).

For example, suppose your signed url is the following (expiry timestamp shortened for example purposes):

http://[domain].cloudfront.net/image.jpg?Expires=1000&Signature=[Signature]

If you'd like the client to benefit from caching, you have to send it to the same url. You cannot, for example, direct the client to the following url and expect the client to use a cached response from the first url:

http://[domain].cloudfront.net/image.jpg?Expires=5000&Signature=[Signature]

There are currently no cache control mechanisms to get around this, including ETag, Vary, etc. The nature of client caching on the web is that a resource in cache is associated with a url, and the purpose of the other mechanisms is to help the client determine when its cached version of a resource identified by a particular url is still fresh.

You're therefore stuck in a situation where, to benefit from a cached response, you have to send the client to the same url as the first request. There are potential ways to accomplish this (cookies, local storage, server scripting, etc.), and let's suppose that you have implemented one.

You next have to consider that caching is only just a suggestion and even then it isn't a guarantee. If you expect the client to have the image cached and serve it the original url to benefit from that caching, you run the risk of a cache miss. In the case of a cache miss after the url expiry time, the original url is no longer valid. The client is then left unable to display the image (from the cache or from the provided url).

The behavior you're looking for simply cannot be provided by conventional caching when the expiry time is in the url.

Since the desired behavior cannot be achieved, you might consider your next best options, each of which will require giving up on one aspect of your requirement. In the order I would consider them:

  1. If you give up short expiry times, you could use longer expiry times and rotate urls. For example, you might set the url expiry to midnight and then serve that same url for all requests that day. Your client will benefit from caching for the day, which is likely better than none at all. Obvious disadvantage is that your urls are valid longer.

  2. If you give up content delivery, you could serve the images from a server which checks for access with each request. Clients will be able to cache the resource for as long as you want, which may be better than content delivery depending on the frequency of cache hits. A variation of this is to trade Amazon CloudFront for another provider, since there may be other content delivery networks which support this behavior (although I don't know of any). The loss of the content delivery network may be a disadvantage or may not matter much depending on your specific visitors.

  3. If you give up the simplicity of a single static HTTP request, you could use client side scripting to determine the request(s) that should be made. For example, in javascript you could attempt to retrieve the resource using the original url (to benefit from caching), and if it fails (due to a cache miss and lapsed expiry) request a new url to use for the resource. A variation of this is to use some caching mechanism other than the browser cache, such as local storage. The disadvantage here is increased complexity and compromised ability for the browser to prefetch.

like image 121
Michael Petito Avatar answered Sep 22 '22 16:09

Michael Petito