Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Cloudfront evicting objects from cache within mere hours?

Cloudfront is configured to cache the images from our app. I found that the images were evicted from the cache really quickly. Since the images are generated dynamically on the fly, this is pretty intense for our server. In order to solve the issue I set up a testcase.

Origin headers

The image is served from our origin server with correct Last-Modified and Expires headers.

origin headers

Cloudfront cache behaviour

Since the site is HTTPS only I set the Viewer Protocol Policy to HTTPS. Forward Headers is set to None and Object Caching to Use Origin Cache Headers.

cloudfront cache behaviour settings

The initial image request

I requested an image at 11:25:11. This returned the following status and headers:

  • Code: 200 (OK)
  • Cached: No

  • Expires: Thu, 29 Sep 2016 09:24:31 GMT

  • Last-Modified: Wed, 30 Sep 2015 09:24:31 GMT
  • X-Cache: Miss from cloudfront

initial request headers

A subsequent request

A reload a little while later (11:25:43) returned the image with:

  • Code: 304 (Not Modified)
  • Cached: Yes

  • Expires: Thu, 29 Sep 2016 09:24:31 GMT

  • X-Cache: Hit from cloudfront

subsequent request headers

A request a few hours later

Nearly three hours later (at 14:16:11) I went to the same page and the image loaded with:

  • Code: 200 (OK)
  • Cached: Yes

  • Expires: Thu, 29 Sep 2016 09:24:31 GMT

  • Last-Modified: Wed, 30 Sep 2015 09:24:31 GMT
  • X-Cache: Miss from cloud front

enter image description here

Since the image was still cached by the browser it loaded quickly. But I cannot understand how the Cloudfront could not return the cached image. Therefor the app had to generate the image again.

I read that Cloudfront evicts files from its cache after a few days of being inactive. This is not the case as demonstrated above. How could this be?

like image 351
richard Avatar asked Sep 30 '15 18:09

richard


1 Answers

I read that Cloudfront evicts files from its cache after a few days of being inactive.

Do you have an official source for that?

Here's the official answer:

If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that have been requested more recently.

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html

There is no guaranteed retention time for cached objects, and objects with low demand are more likely to be evicted... but that isn't the only factor you may not have considered. Eviction may not be the issue, or the only issue.

Objects cached by CloudFront are like Schrödinger's cat. It's a loose analogy, but I'm running with it: whether an object is "in the cloudfront cache" at any given instant is not a yes-or-no question.

CloudFront has somewhere around 53 edge locations (where your browser connects and the content is physically stored) in 37 cities. Some major cities have 2 or 3. Each request that hits cloudfront is routed (via DNS) to the most theoretically optimal location -- for simplicity, we'll call it the "closest" edge to where you are.

The internal workings of Cloudfront are not public information, but the general consensus based on observations and presumably authoritative sources is that these edge locations are all independent. They don't share caches.

If, for example, your are in Texas (US) and your request routed through and was cached in Dallas/Fort Worth, TX, and if the odds are equal that you any request from you could hit either of the Dallas edge locations, then until you get two misses of the same object, the odds are about 50/50 that your next request will be a miss. If I request that same object from my location, which I know from experience tends to route through South Bend, IN, then the odds of my first request being a miss are 100%, even though it's cached in Dallas.

So an object is not either in, or not in, the cache because there is no "the" (single, global) cache.

It is also possible that CloudFront's determination of the "closest" edge to your browser will change over time.

CloudFront's mechanism for determining the closest edge appears to be dynamic and adaptive. Changes in the topology of the Internet at large can change shift which edge location will tend to receive requests sent from a given IP address, so it is possible that over the course of a few hours, that the edge you are connecting to will change. Maintenance or outages or other issues impacting a particular edge could also cause requests from a given source IP address to be sent to a different edge than the typical one, and this could also give you the impression of objects being evicted, since the new edge's cache would be different from the old.

Looking at the response headers, it isn't possible to determine which edge location handled each request. However, this information is provided in the CloudFront access logs.

I have a fetch-and-resize image service that handles around 750,000 images per day. It's behind CloudFront, and my hit/miss ratio is about 50/50. That is certainly not all CloudFront's fault, since my pool of images exceeds 8 million, the viewers are all over the world, and my max-age directive is shorter than yours. It has been quite some time since I last analyzed the logs to determine which and how "misses" seemed unexpected (though when I did, there definitely were some, but their number was not unreasonable), but that is done easily enough, since the logs tell you whether each response was a hit or a miss, as well as identifying the edge location... so you could analyze that to see if there's really a pattern here.

My service stores all of its output content in S3, and when a new request comes in, it first sends a quick request to the S3 bucket to see if there is work that can be avoided. If a result is returned by S3, then that result is returned to CloudFront instead of doing all the fetching and resizing work, again. Mind you, I did not implement that capability because of the number of CloudFront misses... I designed that in from the beginning, before I ever even tested it behind CloudFront, because -- after all -- CloudFront is a cache, and the contents of a cache are pretty much volatile and ephemeral, by definition.


Update: I stated above that it does not appear possible to identify the edge location forwarding a particular request by examining the request headers from CloudFront... however, it appears that it is possible with some degree of accuracy by examining the source IP address of the incoming request.

For example, a test request sent to one of my origin servers through CloudFront arrives from 54.240.144.13 if I hit my site from home, or 205.251.252.153 when I hit the site from my office -- the locations are only a few miles apart, but on opposite sides of a state boundary and using two different ISPs. A reverse DNS lookup of these addresses shows these hostnames:

server-54-240-144-13.iad12.r.cloudfront.net.
server-205-251-252-153.ind6.r.cloudfront.net.

CloudFront edge locations are named after the nearest major airport, plus an arbitrarily chosen number. For iad12 ... "IAD" is the International Air Transport Association (IATA) code for Washington, DC Dulles airport, so this is likely to be one of the edge locations in Ashburn, VA (which has three, presumably with different numerical codes at the end, but I can't confirm that from just this data). For ind6, "IND" matches the airport at Indianapolis, Indiana, so this strongly suggests that this request comes through the South Bend, IN, edge location. The reliability of this test would depend on the consistency with which CloudFront maintains its reverse DNS entries. It is not documented how many independent caches might be at any given edge location; the assumption is that there's only one, but there might be more than one, having the effect of increasing the miss ratio for very small numbers of requests, but disappearing into the mix for large numbers of requests.

like image 104
Michael - sqlbot Avatar answered Oct 10 '22 16:10

Michael - sqlbot