Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What heuristics do browsers use to cache resources not explicitly set to be cachable?

13.2.2 Heuristic Expiration

Since origin servers do not always provide explicit expiration times, HTTP caches typically assign heuristic expiration times, employing algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time. The HTTP/1.1 specification does not provide specific algorithms, but does impose worst-case constraints on their results. Since heuristic expiration times might compromise semantic transparency, they ought to used cautiously, and we encourage origin servers to provide explicit expiration times as much as possible. HTTP/1.1 RFC 2616

What are the algorithms used by browsers to estimate plausible expiration times?

The ideal answer will cover all major browsers with evidence from source code or official blog posts.

like image 254
Rick Viscomi Avatar asked Jan 15 '13 20:01

Rick Viscomi


People also ask

How does the browser know what resources to cache?

The basic idea behind it is the following: The browser requests some content from the web server. If the content is not in the browser cache then it is retrieved directly from the web server. If the content was previously cached, the browser bypasses the server and loads the content directly from its cache.

How do you cache static resources using HTTP caching?

Here is what you need to remember while caching static resources on CDN or local cache server: Use Cache-control HTTP directive to control who can cache the response, under which conditions, and for how long. Configure your server or application to send validation token Etag. Do not cache HTML in the browser.

How does HTTP cache work?

The HTTP cache stores a response associated with a request and reuses the stored response for subsequent requests. There are several advantages to reusability. First, since there is no need to deliver the request to the origin server, then the closer the client and cache are, the faster the response will be.

Which headers can potentially allow a resource to be loaded from cache?

Cache-control is an HTTP header used to specify browser caching policies in both client requests and server responses. Policies include how a resource is cached, where it's cached and its maximum age before expiring (i.e., time to live).


2 Answers

Let's assume all browsers we are interested in are Internet Explorer 8 or newer (e.g. IE5 has some terrible behaviour with caching headers).

There is only ONE standards based way of controlling caching (introduced with HTTP/1.1) - the Cache-Control HTTP header.

Since at least 1996 IE has been using an opt-out policy for caching HTTPS content.

Seemingly since its introduction Chrome has done opt-out for HTTPS (i.e. it will cache it unless told not to). In 2011 Firefox 4 (but not Safari) switched to opt-out caching for HTTPS content. Source.

Recommendations

  1. Only use HTTP headers to control browser caching. If you decide to go against this be aware that IE only recognizes two cache control directives that are set inside HTML:

    <META HTTP-EQUIV="Pragma" CONTENT="no-cache"> <META HTTP-EQUIV="Expires" CONTENT="-1"> 

    and seemingly only the former is useful in the HTTPS scenario. Further, there can be problems when trying to use Pragma in IE. Finally, Chrome ignores cache directives in meta tags reducing their usefulness even further.

  2. Don't use the Expires header. In modern browsers Expires is superseded by Cache-Control. Expires: 0 and Pragma: no-cache are technically invalid response headers. Yes, they have existed since the beginning but not all modern browsers (e.g. Chrome) use them and they have been superseded by Cache-Control.

  3. The Vary header is a minefield. How Vary behaves in older IEs. How Vary behaves with XHR. Finding the details out is left as an exercise to the reader - and leaves the impression it is preferable to use different URLs for different content...

  4. Allow the browser to make conditional requests by setting ETags. Etags allow a browser to do a lightweight check to see if the content has changed and it can avoid making a full request if it hasn't.

  5. Be aware some browsers are just broken and need hacks. IE 8 can have issues downloading files which it has been told not to cache.

Browser caching algorithms

  • Chrome 49.0.2606.2 HttpResponseHeaders::GetFreshnessLifetimes()
  • Firefox HTTP Caching FAQ, Firefox 38 ESR nsHttpResponseHead::ComputeFreshnessLifetime() .
  • Internet Explorer (6+?), HTTPS caching in IE 8+, Internet Explorer 9+, Internet Explorer 9+.
  • Webkit (Safari) computeFreshnessLifetimeForHTTPFamily()

See also

  • Google's browser caching recommendations.
like image 178
Anon Avatar answered Oct 05 '22 23:10

Anon


From Chromium's source code: https://code.google.com/p/chromium/codesearch#chromium/src/net/http/http_response_headers.cc&l=1082&rcl=1421094684

  if ((response_code_ == 200 || response_code_ == 203 ||        response_code_ == 206) && !must_revalidate) {     // TODO(darin): Implement a smarter heuristic.     Time last_modified_value;     if (GetLastModifiedValue(&last_modified_value)) {       // The last-modified value can be a date in the future!       if (last_modified_value <= date_value) {         lifetimes.freshness = (date_value - last_modified_value) / 10;         return lifetimes;       }     }   } 
like image 34
Rick Viscomi Avatar answered Oct 05 '22 23:10

Rick Viscomi