I'd like to use an HTTP proxy (such as nginx) to cache large/expensive requests. These resources are identical for any authorized user, but their authentication/authorization needs to be checked by the backend on each request. It sounds like something like <code>Cache-Control: public, max-age=0</code> along with the nginx directive <code>proxy_cache_revalidate on;</code> is the way to do this. The proxy can cache the request, but every subsequent request needs to do a conditional GET to the backend to ensure it's authorized before returning the cached resource. The backend then sends a 403 if the user is unauthorized, a 304 if the user is authorized and the cached resource isn't stale, or a 200 with the new resource if it has expired. In nginx if <code>max-age=0</code> is set the request isn't cached at all. If <code>max-age=1</code> is set then if I wait 1 second after the initial request then nginx does perform the conditional GET request, however before 1 second it serves it directly from cache, which is obviously very bad for a resource that needs to be authenticated. Is there a way to get nginx to cache the request but immediately require revalidating? Note this does work correctly in Apache. Here are examples for both nginx and Apache, the first 2 with max-age=5, the last 2 with max-age=0: <pre class="prettyprint"><code># Apache with `Cache-Control: public, max-age=5` $ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cache: MISS from 172.x.x.x < X-Cache: HIT from 172.x.x.x < X-Cache: HIT from 172.x.x.x < X-Cache: HIT from 172.x.x.x < X-Cache: HIT from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x < X-Cache: HIT from 172.x.x.x # nginx with `Cache-Control: public, max-age=5` $ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cached: MISS < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: HIT < X-Cached: REVALIDATED < X-Cached: HIT < X-Cached: HIT # Apache with `Cache-Control: public, max-age=0` # THIS IS WHAT I WANT $ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cache: MISS from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x < X-Cache: REVALIDATE from 172.x.x.x # nginx with `Cache-Control: public, max-age=0` $ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS < X-Cached: MISS </code></pre> As you can see in the first 2 examples the requests are able to be cached by both Apache and nginx, and Apache correctly caches even max-age=0 requests, but nginx does not.

I would like to address the additional questions / concerns that have come up during the conversation since my original answer of simply using <code>X-Accel-Redirect</code> (and, if Apache-compatibility is desired, <code>X-Sendfile</code>, respectively). The solution that you seek as "optimal" (without <code>X-Accel-Redirect</code>) is incorrect, for more than one reason: <ol> <li> All it takes is a request from an unauthenticated user for your cache to be wiped clean. <ul> <li>If every other request is from an unauthenticated user, you effectively simply have no cache at all whatsoever.</li> <li>Anyone can make requests to the public URL of the resource to keep your cache wiped clean at all times.</li> </ul> </li> <li>If the files served are, in fact, static, then you're wasting extra memory, time, disc and vm/cache space for keeping more than one copy of each file.</li> <li> If the content served is dynamic: <ul> <li>Is it the same constant cost to perform authentication as resource generation? Then what do you actually gain by caching it when revalidation is always required? A constant factor less than 2x? You might as well not bother with caching simply to tick a checkmark, as real-world improvement would be rather negligible.</li> <li>Is it exponentially more expensive to generate the view than to perform authentication? Sounds like a good idea to cache the view, then, and serve it to tens of thousands of requests at peak time! But for that to happen successfully you better not have any unauthenticated users lurking around (as even a couple could cause significant and unpredictable expenses of having to regen the view).</li> </ul> </li> <li>What happens with the cache in various edge-case scenarios? What if the user is denied access, without the developer using appropriate code, and then that gets cached? What if the next administrator decides to tweak a setting or two, e.g., <code>proxy_cache_use_stale</code>? Suddenly, you have unauthenticated users receiving privy information. You're leaving all sorts of cache poisoning attack vectors around by needlessly joining together independent parts of your application.</li> <li>I don't think it's technically correct to return <code>Cache-Control: public, max-age=0</code> for a page that requires authentication. I believe the correct response might be <code>must-revalidate</code> or <code>private</code> in place of <code>public</code>.</li> </ol> The nginx "deficiency" on the lack of support for immediate revalidation w/ <code>max-age=0</code> is by design (similarly to its lack of support for <code>.htaccess</code>). As per the above points, it makes little sense to immediately require re-validation of a given resource, and it's simply an approach that doesn't scale, especially when you have a "ridiculous" amount of requests per second that must all be satisfied using minimal resources and under no uncertain terms. If you require a web-server designed by a "committee", with backwards compatibility for every kitchen-sink application and every questionable part of any RFC, nginx is simply not the correct solution. On the other hand, <code>X-Accel-Redirect</code> is really simple, foolproof and de-facto standard. It lets you separate content from access control in a very neat way. It's dead simple. It actually ensures that your content will be cached, instead of your cache be wiped out clean willy-nilly. It is the correct solution worth pursuing. Trying to avoid an "extra" request every 10K servings during the peek time, at the price of having only "one" request when no caching is needed in the first place, and effectively no cache when the 10K requests come by, is not the correct way to design scalable architectures.

nginx cache but immediately expire/revalidate using `Cache-Control: public, s-maxage=0`

Tags:

http-headers

nginx

reverse-proxy

cache-control

http-caching

I'd like to use an HTTP proxy (such as nginx) to cache large/expensive requests. These resources are identical for any authorized user, but their authentication/authorization needs to be checked by the backend on each request.

It sounds like something like Cache-Control: public, max-age=0 along with the nginx directive proxy_cache_revalidate on; is the way to do this. The proxy can cache the request, but every subsequent request needs to do a conditional GET to the backend to ensure it's authorized before returning the cached resource. The backend then sends a 403 if the user is unauthorized, a 304 if the user is authorized and the cached resource isn't stale, or a 200 with the new resource if it has expired.

In nginx if max-age=0 is set the request isn't cached at all. If max-age=1 is set then if I wait 1 second after the initial request then nginx does perform the conditional GET request, however before 1 second it serves it directly from cache, which is obviously very bad for a resource that needs to be authenticated.

Is there a way to get nginx to cache the request but immediately require revalidating?

Note this does work correctly in Apache. Here are examples for both nginx and Apache, the first 2 with max-age=5, the last 2 with max-age=0:

# Apache with `Cache-Control: public, max-age=5`

$ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cache: MISS from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: HIT from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: HIT from 172.x.x.x

# nginx with `Cache-Control: public, max-age=5`

$ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cached: MISS
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: HIT
< X-Cached: REVALIDATED
< X-Cached: HIT
< X-Cached: HIT

# Apache with `Cache-Control: public, max-age=0`
# THIS IS WHAT I WANT

$ while true; do curl -v http://localhost:4001/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cache: MISS from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x
< X-Cache: REVALIDATE from 172.x.x.x

# nginx with `Cache-Control: public, max-age=0`

$ while true; do curl -v http://localhost:4000/ >/dev/null 2>&1 | grep X-Cache; sleep 1; done
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS
< X-Cached: MISS

As you can see in the first 2 examples the requests are able to be cached by both Apache and nginx, and Apache correctly caches even max-age=0 requests, but nginx does not.

692

asked Dec 20 '16 22:12

tlrobinson

1 Answers

I would like to address the additional questions / concerns that have come up during the conversation since my original answer of simply using X-Accel-Redirect (and, if Apache-compatibility is desired, X-Sendfile, respectively).

The solution that you seek as "optimal" (without X-Accel-Redirect) is incorrect, for more than one reason:

All it takes is a request from an unauthenticated user for your cache to be wiped clean.
- If every other request is from an unauthenticated user, you effectively simply have no cache at all whatsoever.
- Anyone can make requests to the public URL of the resource to keep your cache wiped clean at all times.
If the files served are, in fact, static, then you're wasting extra memory, time, disc and vm/cache space for keeping more than one copy of each file.
If the content served is dynamic:
- Is it the same constant cost to perform authentication as resource generation? Then what do you actually gain by caching it when revalidation is always required? A constant factor less than 2x? You might as well not bother with caching simply to tick a checkmark, as real-world improvement would be rather negligible.
- Is it exponentially more expensive to generate the view than to perform authentication? Sounds like a good idea to cache the view, then, and serve it to tens of thousands of requests at peak time! But for that to happen successfully you better not have any unauthenticated users lurking around (as even a couple could cause significant and unpredictable expenses of having to regen the view).
What happens with the cache in various edge-case scenarios? What if the user is denied access, without the developer using appropriate code, and then that gets cached? What if the next administrator decides to tweak a setting or two, e.g., proxy_cache_use_stale? Suddenly, you have unauthenticated users receiving privy information. You're leaving all sorts of cache poisoning attack vectors around by needlessly joining together independent parts of your application.
I don't think it's technically correct to return Cache-Control: public, max-age=0 for a page that requires authentication. I believe the correct response might be must-revalidate or private in place of public.

The nginx "deficiency" on the lack of support for immediate revalidation w/ max-age=0 is by design (similarly to its lack of support for .htaccess). As per the above points, it makes little sense to immediately require re-validation of a given resource, and it's simply an approach that doesn't scale, especially when you have a "ridiculous" amount of requests per second that must all be satisfied using minimal resources and under no uncertain terms. If you require a web-server designed by a "committee", with backwards compatibility for every kitchen-sink application and every questionable part of any RFC, nginx is simply not the correct solution.

On the other hand, X-Accel-Redirect is really simple, foolproof and de-facto standard. It lets you separate content from access control in a very neat way. It's dead simple. It actually ensures that your content will be cached, instead of your cache be wiped out clean willy-nilly. It is the correct solution worth pursuing. Trying to avoid an "extra" request every 10K servings during the peek time, at the price of having only "one" request when no caching is needed in the first place, and effectively no cache when the 10K requests come by, is not the correct way to design scalable architectures.

answered Oct 27 '22 12:10

cnst

Related questions
                            
                                Proper Way to Deploy Django/Static Files on Heroku
                            
                                How to speed up delivery for static files with nginx? (Cache them in memory?)
                            
                                How can I read and manipulate POST request variables with nginx?
                            
                                Nginx http_status_module statistics
                            
                                Dokku: Listen to multiple ports from an app
                            
                                Nginx consumes Upgrade header after proxy_pass
                            
                                Proxying WebSocket connections and ephemeral port exhaustion
                            
                                puma: puma.sock No such file or directory
                            
                                NGINX TLS termination for PostgreSQL
                            
                                How to properly configure k8s nginx ingress base url substitution to handle Angular client side routing (LocationStrategy)?
                            
                                .well-known/acme-challenge nginx 404 error
                            
                                Nginx with Let's encrypt "Welcome to Nginx" instead of rails app
                            
                                How correctly install ssl certificate using certbot in docker?
                            
                                How to have a header routing logic with nginx ingress-controller?
                            
                                How do docker-compose network aliases work if there are multiple instances for zero downtime container update?
                            
                                Why Node.js failed to serve .woff files
                            
                                Pre shared keys (TLS-PSK) NGINX configuration
                            
                                Using nginx with supervisor - nginx process launched multiple times leading to bind error
                            
                                uWSGI nginx error : connect() failed (111: Connection refused) while connecting to upstream
                            
                                Gunicorn with max-request limit blocks on high load

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With