Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Technique for subclassing Django UpdateCacheMiddleware and FetchFromCacheMiddleware

I've used the UpdateCacheMiddleware and FetchFromCacheMiddleware MiddleWare to enable site-wide anonymous caching to varying levels of success.

The biggest problem is that the Middleware only caches an anonymous user's first request. Since a session_id cookie is set on that first response, subsequent requests by that anonymous user do not hit the cache as a result of the view level cache varying on Headers.

My webpages do not meaningfully vary among anonymous users and, in so far as they do vary, I can handle that via Ajax. As a result, I decided to try to subclass Django's caching Middleware to no longer vary on Header. Instead, it varies on Anonymous vs. LoggedIn Users. Because I am using the Auth backend, and that handler occurs before fetching from the cache, it seems to work.

class AnonymousUpdateCacheMiddleware(UpdateCacheMiddleware):

    def process_response(self, request, response):
        """
        Sets the cache, if needed.
        We are overriding it in order to change the behavior of learn_cache_key().
        """

        if not self._should_update_cache(request, response):
            # We don't need to update the cache, just return.
            return response
        if not response.status_code == 200:
            return response

        timeout = get_max_age(response)
        if timeout == None:
            timeout = self.cache_timeout
        elif timeout == 0:
            # max-age was set to 0, don't bother caching.
            return response
        patch_response_headers(response, timeout)
        if timeout:
            ######### HERE IS WHERE IT REALLY GOES DOWN #######
            cache_key = self.learn_cache_key(request, response, self.cache_timeout, self.key_prefix, cache=self.cache)
            if hasattr(response, 'render') and callable(response.render):
                response.add_post_render_callback(
                    lambda r: self.cache.set(cache_key, r, timeout)
                )
            else:
                self.cache.set(cache_key, response, timeout)
        return response

    def learn_cache_key(self, request, response, timeout, key_prefix, cache=None):
        """_generate_cache_header_key() creates a key for the given request path, adjusted for locales.

            With this key, a new cache key is set via _generate_cache_key() for the HttpResponse

            The subsequent anonymous request to this path hits the FetchFromCacheMiddleware in the
            request capturing phase, which then looks up the headerlist value cached here on the initial response.

            FetchFromMiddleWare calcuates a cache_key based on the values of the listed headers using _generate_cache_key
            and then looks for the response stored under that key.  If the headers are the same as those
            set here, there will be a cache hit and the cached HTTPResponse is returned.
        """

        key_prefix = key_prefix or settings.CACHE_MIDDLEWARE_KEY_PREFIX
        cache_timeout = self.cache_timeout or settings.CACHE_MIDDLEWARE_SECONDS
        cache = cache or get_cache(settings.CACHE_MIDDLEWARE_ALIAS)

        cache_key = _generate_cache_header_key(key_prefix, request)

        # Django normally varies caching by headers so that authed/anonymous users do not see same pages
        # This makes Google Analytics cookies break caching;
        # It also means that different anonymous session_ids break caching, so only first anon request works
        # In this subclass, we are ignoring headers and instead varying on authed vs. anonymous users
        # Alternatively, we could also strip cookies potentially for the same outcome

        # if response.has_header('Vary'):
        #     headerlist = ['HTTP_' + header.upper().replace('-', '_')
        #                   for header in cc_delim_re.split(response['Vary'])]
        # else:
        headerlist = []

        cache.set(cache_key, headerlist, cache_timeout)
        return _generate_cache_key(request, request.method, headerlist, key_prefix)

The Fetcher, which is responsible for retrieving the page from the cache, looks like this

class AnonymousFetchFromCacheMiddleware(FetchFromCacheMiddleware):

    def process_request(self, request):
        """
        Checks whether the page is already cached and returns the cached
        version if available.
        """
        if request.user.is_authenticated():
            request._cache_update_cache = False
            return None
        else:
            return super(SmarterFetchFromCacheMiddleware, self).process_request(request)

There was a lot of copying for UpdateCacheMiddleware, obviously. I couldn't figure out a better hook to make this cleaner.

Does this generally seem like a good approach? Any obvious issues that come to mind?

Thanks, Ben

like image 852
Ben Avatar asked Jun 07 '12 22:06

Ben


1 Answers

You may work around this by temporarily removing unwanted vary fields from response['Vary']:

from django.utils.cache import cc_delim_re

class AnonymousUpdateCacheMiddleware(UpdateCacheMiddleware):
    def process_response(self, request, response):
        vary = None
        if not request.user.is_authenticated() and response.has_header('Vary'):
                vary = response['Vary']
                # only hide cookie here, add more as your usage
                response['Vary'] = ', '.join(
                    filter(lambda v: v != 'cookie', cc_delim_re.split(vary))
        response = super(AnonymousUpdateCacheMiddleware, self).process_response(request, response)
        if vary is not None:
            response['Vary'] = vary
        return response

Also, set CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True in settings to prevent cache for authenticated users.

like image 101
okm Avatar answered Sep 29 '22 11:09

okm