Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CDN support/configuration for serving "stale" content, refreshing in background

Goal

Always serve content from a CDN EDGE cache, regardless of how stale. Refresh it in the background when possible.

Problem

I have a NextJS app that renders some React components server-side and delivers them to the client. For this discussion, let's just consider my homepage, which is unauthenticated and the same for everyone.

What I'd like is for the server rendered homepage to be cached at a CDN's EDGE nodes and served to end clients from that cache as often as possible, or always.

From what I've read, CDNs (like Fastly) which properly support cache related header settings like Surrogate-Control and Cache-Control: stale-while-revalidate should be able to do this, but in practice, I'm not seeing this working like I'd expect. I'm seeing either:

  • requests miss the cache and return to the origin when a prior request should have warmed it
  • requests are served from cache, but never get updated when the origin publishes new content

Example

Consider the following timeline:


[T0] - Visitor1 requests www.mysite.com - The CDN cache is completely cold, so the request must go back to my origin (AWS Lambda) and recompute the homepage. A response is returned with the headers Surrogate-Control: max-age=100 and Cache-Control: public, no-store, must-revalidate. Visitor1 then is served the homepage, but they had to wait a whopping 5 seconds! YUCK! May no other visitor ever have to suffer the same fate.

[T50] - Visitor2 requests www.mysite.com - The CDN cache contains my document and returns it to the visitor immediately. They only had to wait 40ms! Awesome. In the background, the CDN refetches the latest version of the homepage from my origin. Turns out it hasn't changed.

[T80] - www.mysite.com publishes new content to the homepage, making any cached content truly stale. V2 of the site is now live!

[T110] - Visitor1 returns to www.mysite.com - From the CDNs perspective, it's only been 60s since Visitor2's request, which means the background refresh initiated by Visitor2 should have resulted in a <100s stale copy of the homepage in the cache (albeit V1, not V2, of the homepage). Visitor1 is served the 60s stale V1 homepage from cache. A much better experience for Visitor1 this time! This request initiates a background refresh of the stale content in the CDN cache, and the origin this time returns V2 of the website (which was published 30s ago).

[T160] - Visitor3 visits www.mysite.com - Despite being a new visitor, the CDN cache is now fresh from Visitor1's most recent trigger of a background refresh. Visitor3 is served a cached V2 homepage.

...

As long as at least 1 visitor comes to my site every 100s (because max-age=100), no visitor will ever suffer the wait time of a full roundtrip to my origin.


Questions

1. Is this a reasonable ask of a modern CDN? I can't imagine this is more taxing than always returning to the origin (no CDN cache), but I've struggled to find documentation from any CDN provider about the right way to do this. I'm working with Fastly now, but am willing to try any others as well (I tried Cloudflare first, but read that they don't support stale-while-revalidate)

2. What are the right headers to do this with? (assuming the CDN provider supports them) I've played around with both Surrogate-Control: maxage=<X> and Cache-Control: public, s-maxage=<X>, stale-while-revalidate in Fastly and Cloudflare, but none seem to do this correctly (requests well within the maxage timeframe dont pickup changes on the origin until there is a cache miss).

3. If this isn't supported, are there API calls that could allow me to PUSH content updates to my CDN's cache layer, effectively saying "Hey I just published new content for this cache key. Here it is!"

I could use a Cloudflare worker to implement this kinda caching myself using their KV store, but I thought I'd do a little more research before implementing a code solution to a problem that seems to be pretty common.

Thanks in advance!

like image 592
jamis0n Avatar asked Jan 19 '20 18:01

jamis0n


People also ask

Can CDN be used for static content caching?

A Content Delivery Network (CDN) is a critical component of nearly any modern web application. It used to be that CDN merely improved the delivery of content by replicating commonly requested files (static content) across a globally distributed set of caching servers.

What is CDN cache server?

What is CDN caching? A CDN, or content delivery network, caches content (such as images, videos, or webpages) in proxy servers that are located closer to end users than origin servers. (A proxy server is a server that receives requests from clients and passes them along to other servers.)

How is CDN cache updated?

Cached content on CDN nodes is not updated in real time. CDN nodes only retrieve new content from the origin server when the previously cached content expires. If you want to update content cached on CDN nodes, configure cache rules or submit cache refreshing or cache preheating tasks.

How long does CDN cache last?

Cache expiration duration For the Override and Set if missing Caching behavior settings, valid cache durations range between 0 seconds and 366 days. For a value of 0 seconds, the CDN caches the content, but must revalidate each request with the origin server.


1 Answers

I've been deploying a similar application recently. I ended up running a customised nginx instance in front of the Next.js server.

  • Ignore cache headers from the upstream server.
    • I wanted to cache markup and JSON, but I didn't want to send Cache-Control headers to the client. You could tweak this config to use the values in Cache-Control from Next.js, and then drop that header before responding to the client if the MIME type is text/html or application/json.
  • Consider all responses valid for 10 minutes.
  • Remove cached responses after 30 days.
  • Use up to 800 MB for the cache.
  • After serving a stale response, attempt to fetch a new response from the upstream server.

This isn't perfect, but it handles the important stale-while-revalidate behaviour. You could run a CDN over this as well if you want the benefit of global propagation.

Warning: This hasn't been extensively tested. I'm not confident that all the behaviour around error pages and response codes is right.

# Available in NGINX Plus
# map $request_method $request_method_is_purge {
#   PURGE   1;
#   default 0;
# }

proxy_cache_path
  /nginx/cache
  inactive=30d
  max_size=800m
  keys_zone=cache_zone:10m;

server {
  listen 80 default_server;
  listen [::]:80 default_server;

  # Basic
  root /nginx;
  index index.html;
  try_files $uri $uri/ =404;

  access_log off;
  log_not_found off;

  # Redirect server error pages to the static page /error.html
  error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 500 501 502 503 504 505 /error.html;

  # Catch error page route to prevent it being proxied.
  location /error.html {}

  location / {
    # Let the backend server know the frontend hostname, client IP, and
    # client–edge protocol.
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header X-Forwarded-Proto $scheme;
    # This header is a standardised replacement for the above two. This line
    # naively ignores any `Forwarded` header passed from the client (which could
    # be another proxy), and instead creates a new value equivalent to the two
    # above.
    proxy_set_header Forwarded "for=$remote_addr;proto=$scheme";

    # Use HTTP 1.1, as 1.0 is default
    proxy_http_version 1.1;

    # Available in NGINX Plus
    # proxy_cache_purge $request_method_is_purge;

    # Enable stale-while-revalidate and stale-if-error caching
    proxy_cache_background_update on;
    proxy_cache cache_zone;
    proxy_cache_lock on;
    proxy_cache_lock_age 30s;
    proxy_cache_lock_timeout 30s;

    proxy_cache_use_stale
      error
      timeout
      invalid_header
      updating
      http_500
      http_502
      http_503
      http_504;

    proxy_ignore_headers X-Accel-Expires Expires Cache-Control Vary;
    proxy_cache_valid 10m;

    # Prevent 502 error
    proxy_buffers 8 32k;
    proxy_buffer_size 64k;
    proxy_read_timeout 3600;

    proxy_pass "https://example.com";
  }
}
like image 192
Blieque Avatar answered Oct 20 '22 08:10

Blieque