Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Deal with Cookies in Varnish stack

Due to slow performance of the site, I started looking info Varnish as a caching solution and have some questions about Google Analytics.

When there are 5K active users on the site (according to GA's live traffic report), the server loads on backend servers spike to 30-40 +, passenger's queue's start stacking up and site is almost unusable. I'm aware of the slow queries and database work that requires to get better performance, but at the moment I don't have resources to optimize the queries and the db schema, indexes, etc., so looking into adding varnish.

I created a diagram to better display the stack, here is how the current stack looks like: (the site currently caches images/css/js in CDN - Akamai)

enter image description here

I'd like to add two varnish instances at front of backend servers to cache articles and the stack will look like this :

enter image description here

The site is a news site, and I'm looking for an advise, how to properly handle the cookies and caching. For phase 1, I'd like to simply exclude authenticated users completely and serve dynamic content, since there are not many simultaneous authenticated users.

The confusion is with Google Analytic's cookies. My understanding, is that Google sets a cookie on a client using javascript, and client communicates directly with Google, so the backend doesn't need the GA cookies that client sends and it's safe to unset them during vcl_recv subroutine.

sub vcl_recv {

// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");
// Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
}

Questions

  • Is this a safe approach?
  • Will Google still track properly, including repeat visitors?
  • Is there anything else that I need to watch for in my policies for phase1?

Since varnish by default will NOT cache anything that has a cookie set, is it safe to implement the stack described above, by just adding a policy to remove GA cookies? I understand that without fine tuning VCL policies, I won't get high hit rate, however during my tests, it appears that even with default varnish at front of a backend server, had 30% hit rate, and after analyzing those, I see that most are js/css and image files, so clearly some of those static files are not being served by Akamai or even Apache, instead they are passed to Passenger/Rails to serve a static file. This definitely needs to be corrected.

  • Will Varnish improve the performance with just default?

I'm new to varnish so any additional detail/advise on varnish or the stack that I proposed, is greatly appreciated.

For phase 2 +

Since the content get updated, I'm planning to execute purges on both varnish servers, triggered by the backend servers when a change applies, such as a user comment, page views, etc.

There are plenty of archived articles that don't get updated, is it safe to cache them forever?

Since I'm planning to use RAM for varnish storage, should I have additional (third) varnish, and use disk for storage, for explicitly those archived pages. Perhaps adding nginx stack at front of varnish servers to direct the traffic to a specific varnish instance for the archived content? Load Balancer - > Pair of Nginx reverse proxies > Pair of Varnish - > (varnish LB to 8 backend servers)

I appreciate any advise on the architecture as well. If you need more details to provide with a better advise, please let me know and I'll be happy to provide you with more details.

like image 926
Nerses Avatar asked Apr 19 '13 00:04

Nerses


1 Answers

That is a lot of questions! :-)

Q. Is this a safe approach?

On the surface, I would say so.

Generally, setting up Varnish on a news site where there is a high volume of traffic and fast-changing content can be a challenge.

A really good way to check is to build a single varnish box and give it direct access to your cluster (not via the load balancer), and give it a temporary public IP address. That will give you a chance to test against VCL changes. You will be able to test commenting, logging in (if you have it), and anything else to make sure there are no surprises.

Q. Will Google still track properly, including repeat visitors?

Yes. The cookies are only used on the client side.

One thing you should watch is that when the backend sends a cookie, Varnish will not cache the content either. You will need to remove any cookies that are not required on vcl_fetch. This might be a problem if cookies are used to track user state.

Q. Is there anything else that I need to watch for in my policies for phase1?

You will need to disable rack-cache in Rails, and set your own headers. Be aware that should you remove varnish, Rails will be running with no cacheing and probably will collapse!

This is what I have in my production.rb:

  # We do not use Rack::Cache but rely on Varnish instead
  config.middleware.delete Rack::Cache
  # varnish does not support etags or conditional gets
  # to the backend (which is this app) so remove them too
  config.middleware.delete Rack::ETag
  config.middleware.delete Rack::ConditionalGet

And in my application_controller I have this private method:

def set_public_cache_control(duration)
  if current_user
    response.headers["Cache-Control"] = "max-age=0, private, must-revalidate"
  else
    expires_in duration, :public => true
    response.headers["Expires"] = CGI.rfc1123_date(Time.now + duration)
  end
end

That is called in my other controllers so that I have very fine-grained control over how much chacheing is applied to various parts of the site. I use a setup method in each controller that is run as a before_filter:

def setup
  set_public_cache_control 10.minutes
end

(The application_controller has the filter and a blank setup method, so it can be optional in the other controllers)

If you have a part of the site that does not require cookies you can strip them off based on URL in the VCL, and apply headers.

You can set the cache time for your static assets in your apache config like this (assuming you are using the default asset path):

<LocationMatch "^/assets/.*$">
    Header unset ETag
    FileETag None
    # RFC says only cache for 1 year
    ExpiresActive On
    ExpiresDefault "access plus 1 year"
    Header append Cache-Control "public"
</LocationMatch>

<LocationMatch "^/favicon\.(ico|png)$">
    Header unset ETag
    FileETag None
    ExpiresActive On
    ExpiresDefault "access plus 1 day"
    Header append Cache-Control "public"
</LocationMatch>

<LocationMatch "^/robots.txt$">
    Header unset ETag
    FileETag None
    ExpiresActive On
    ExpiresDefault "access plus 1 hour"
    Header append Cache-Control "public"
</LocationMatch>

Those headers will be sent to your CDN which will cache the assets for much longer. Watching varnish you'll still see requests coming in at a declining rate.

I would also set very short caching on all content where the pages don't need cookies, but change quite often. In my case I set a cache time of 10 seconds for the home page. What this means for Varnish is that one user request will go to the backend every 10 seconds.

You should also consider setting varnish to use grace mode. This allows it to serve slightly stale content from the cache in preference to exposing visitors to a slow response from the backend for items that have just expired.

Q. There are plenty of archived articles that don't get updated, is it safe to cache them forever?

To do this you would need to change your app to send different headers for those articles which are archived. This assumes they won't have cookies. Based on what I do on my site, I would do it this way:-

In the setup above add a conditional to change the cache time:

def setup
  # check if it is old. This code could be anything
  if news.last_updated_at < 1.months.ago
    set_public_cache_control 1.year
  else
    set_public_cache_control 10.minutes
  end
end 

This sets a public header, so Varnish will cache it (if there are no cookies), and so will any remote caches (at ISP or corporate gateways).

The problem with this is if you want to remove the story, or update it (say, for legal reasons).

In that case you should send Varnish a private header to change the TTL for that one URL, but send a shorter public header for everyone else.

That would allow you to set Varnish to serve the content for (say) 1 year, while it sends out headers to tell clients to come back every 10 minutes.

You would need to add a regime to purge varnish in those cases.

TO get you started I have a second method in my application_controller:

def set_private_cache_control(duration=5.seconds)
  # logged in users never have cached content so no TTL allowed
  if ! current_user
    # This header MUST be a string or the app will crash
    if duration
      response.headers["X-Varnish-TTL"] = duration.to_s
    end
  end
end

And in my vcl_fetch I have this:

call set_varnish_ttl_from_header;

and the vcl function is this:

sub set_varnish_ttl_from_header {
  if (beresp.http.X-Varnish-TTL) {
    C{  
      char *x_end = 0;
      const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:"); /* "\016" is length of header plus colon in octal */
      if (x_hdr_val) {
        long x_cache_ttl = strtol(x_hdr_val, &x_end, 0);
        if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) {
          VRT_l_beresp_ttl(sp, (x_cache_ttl * 1));
        }
      }
    }C
    remove beresp.http.X-Varnish-TTL;
  }
}

That is so the header does NOT get passed on (which s-max-age does) to any upstream caches.

The setup method would look like this:

def setup
  # check if it is old. This code could be anything
  if news.last_updated_at < 1.months.ago
    set_public_cache_control 10.minutes
    set_private_cache_control 1.year
  else
    set_public_cache_control 10.minutes
  end
end 

Feel free to ask any supplementary questions, and I'll update this answer!

like image 177
Richard Hulse Avatar answered Sep 26 '22 02:09

Richard Hulse