Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't Varnish sending 304 unmodified when If-Modified-Since header is sent?

When sending a GET request directly to the backend with If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET set, Apache correctly returns a 304 with no content.

When I send the same request through Varnish 3.0.2, it responds with a 200 and resends all the content even though the client already has it. Obviously, this isn't a good use of bandwidth. My understanding is that Varnish supports intelligent handling of this header and should be sending a 304, so I figure I'd done something wrong with my .vcl file.

Varnishlog gives this:

 16 SessionOpen  c 84.97.17.233 64416 :80
   16 ReqStart     c 84.97.17.233 64416 1597323690
   16 RxRequest    c GET
   16 RxURL        c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
   16 RxProtocol   c HTTP/1.0
   16 RxHeader     c Host: www.quotaproject.org
   16 RxHeader     c User-Agent: Sprawk/1.3 (http://www.sprawk.com/)
   16 RxHeader     c Accept: */*
   16 RxHeader     c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   16 RxHeader     c Connection: close
   16 RxHeader     c If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
   16 VCL_call     c recv lookup
   16 VCL_call     c hash
   16 Hash         c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
   16 Hash         c www.quotaproject.org
   16 VCL_return   c hash
   16 Hit          c 1597322756
   16 VCL_call     c hit
   16 VCL_acl      c NO_MATCH CTRLF5
   16 VCL_return   c deliver
   16 VCL_call     c deliver deliver
   16 TxProtocol   c HTTP/1.1
   16 TxStatus     c 200
   16 TxResponse   c OK
   16 TxHeader     c Server: Apache
   16 TxHeader     c Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT
   16 TxHeader     c Vary: Accept-Encoding
   16 TxHeader     c Content-Type: application/pdf
   16 TxHeader     c Date: Wed, 22 Feb 2012 18:25:05 GMT
   16 TxHeader     c Age: 12432
   16 TxHeader     c Connection: close
   16 Gzip         c U D - 107685 115763 80 796748 861415
   16 Length       c 98304
   16 ReqEnd       c 1597323690 1329935105.713264704 1329935106.208528996 0.000071526 0.000068426 0.495195866
   16 SessionClose c EOF mode
   16 StatSess     c 84.97.17.233 64416 0 1 1 0 0 0 203 98304

If I understand this correctly, the object is already in Varnish's cache so it doesn't need to contact the backend, but it already knows the Last-Modified so why would it not respond with 304?

And here's my VCL file:

 backend idea {
  # .host = "www.idea.int";
  .host = "83.145.60.235"; # IDEA's public website IP
  .port = "80";
}
backend qp {
  # .host = "www.quotaproject.org";
  .host = "83.145.60.235"; # IDEA's public website IP
  .port = "80";
}
#
#Below is a commented-out copy of the default VCL logic.  If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
sub vcl_recv {
  # force domain so that Apache handles the VH correctly
  if (req.http.host ~ "^qp" || req.http.host ~ "quotaproject.org$") {
    set req.http.Host = "www.quotaproject.org";
    set req.backend = qp;
  } else {
    # default to idea.int
     set req.http.Host = "www.idea.int";
     set req.backend = idea;
  }
  # Before anything else we need to fix gzip compression 
  if (req.http.Accept-Encoding) {
      if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
          # No point in compressing these
          remove req.http.Accept-Encoding;
      } else if (req.http.Accept-Encoding ~ "gzip") {
          set req.http.Accept-Encoding = "gzip";
      } else if (req.http.Accept-Encoding ~ "deflate") {
          set req.http.Accept-Encoding = "deflate";
      } else {
          # unknown algorithm
          remove req.http.Accept-Encoding;
      }
  }
  # ajax requests bypass cache. TODO: Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest
  if (req.http.X-Requested-With == "XMLHttpRequest") {
        return(pass);
   }
  if (req.request != "GET" &&
     req.request != "HEAD" &&
     req.request != "PUT" &&
     req.request != "POST" &&
     req.request != "TRACE" &&
     req.request != "OPTIONS" &&
     req.request != "DELETE") {
     /* Non-RFC2616 or CONNECT which is weird. */
     return (pipe);
   }
   # Purge everything url - this isn't the squid way, but works
    if (req.url ~ "^/varnishpurge") {
       if (!client.ip ~ purge) {
            error 405 "Not allowed.";
       }
       if (req.url == "/varnishpurge") {
            ban("req.http.host == " + req.http.host + " && req.url ~ ^/");
            error 841 "Purged site.";
       }
       else {
            ban("req.http.host == " + req.http.host + " && req.url ~ ^" + regsub( req.url, "^/varnishpurge(.*)$", "\1" ) + "$");
            error 842 "Purged page.";
       }
    }
  # spoof the client IP (taken from http://utvbloggen.se/snabb-guide-till-varnish/)
  remove req.http.X-Forwarded-For;
  set req.http.X-Forwarded-For = client.ip;
  # Force delivery from cache even if other things indicate otherwise
  if (req.url ~ "\.(flv)") {
    # pipe flash start away
    return(pipe);
  }
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
    # cookies are irrelevant here
    unset req.http.Cookie;
    unset req.http.Authorization; 
  }
  # Force short-circuit to the real site for these dynamic pages
  if (req.url ~ "/customcf/" || req.url ~ "/uid/editData.cfm" || req.url ~ "^/private/") {
    return(pass);
  }
  # Remove user agent, since Apache will server these resources the same way
  if (req.http.User-Agent) {
    set req.http.User-Agent = "";
  }
  if (req.http.Cookie) {
    # removes all cookies named __utm? (utma, utmb...) - tracking thing
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|; ) *__utm.=[^;]+;? *", "\1"); 
    # remove cStates for RHM boxes (the server doesn't need to know these, JS will handle this client-side)
    set req.http.cookie = regsub(req.http.cookie, "(; )?cStates=[^;]*", ""); #cStates might sometimes have a blank value
    # remove ColdFusion session cookie stuff
    if (!req.url ~ "^/publications/" && !req.url ~ "^/uid/admin/") {
      set req.http.cookie = regsub(req.http.cookie, "(; )?CFID=[^;]+", "");
      set req.http.cookie = regsub(req.http.cookie, "(; )?CFTOKEN=[^;]+", "");
    }
    # Remove the cookie header if it's empty after cleanup
    if (req.http.cookie ~ "^;? *$") {
      # The only cookie data left is a semicolon or spaces
      remove req.http.cookie;
    }
  }
}
#
# Called when the requested object was not found in the cache
#
sub vcl_hit {
  # Allow administrators to easily flush the cache from their browser
  if (client.ip ~ CTRLF5) {
    if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache") {
      set obj.ttl = 0s;
      return(pass);
    }
  }
}
#
# Called when the requested object has been retrieved from the
# backend, or the request to the backend has failed
#
sub vcl_fetch {
  set beresp.grace = 1h;
  # strip the cookie before the image is inserted into cache.
  if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
    remove beresp.http.set-cookie;
    set beresp.ttl = 100w;
  }
  # Remove CF session cookies for everything but the publications subsite
  if (!req.url ~ "^/publications/" && !req.url ~ "/customcf/" && !req.url ~ "^/uid/admin/" && !req.url ~ "^/uid/editData.cfm") {
    remove beresp.http.set-cookie;
  }
  if (beresp.ttl < 48h) {
    set beresp.ttl = 48h;
  }
}
#
# Called before a cached object is delivered to the client
#
sub vcl_deliver {
  # We'll be hiding some headers added by Varnish. We want to make sure people are not seeing we're using Varnish.
  remove resp.http.X-Varnish;
  remove resp.http.Via;
  # We'd like to hide the X-Powered-By headers. Nobody has to know we can run PHP and have version xyz of it.
  remove resp.http.X-Powered-By;
}

Can anyone see the problem or problems?

Update: According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3

Note: When handling an If-Modified-Since header field, some
      servers will use an exact date comparison function, rather than a
      less-than function, for deciding whether to send a 304 (Not
      Modified) response.

It seems this may be Varnish's behaviour. I'm sending another date which is previous to the real file's last modified date, but not exactly what is cached in Varnish.

like image 735
Nic Cottrell Avatar asked Feb 22 '12 18:02

Nic Cottrell


2 Answers

The problem is the non-GMT time zone in the If-Modified-Since request header:

If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET

According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3

All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception.

Varnish implements this as a strict requirement, whereas Apache handles nonstandard date formats more robustly. This is why you observed different behavior when querying Apache directly.

like image 117
Andy S Avatar answered Oct 03 '22 22:10

Andy S


Since this question is still open with no answers and several up votes, I'll post an answer.

This does not seem to be an issue with Varnish 3.0.0 (which we are using) or the current version of Varnish you are running on your site.

200 OK response when requesting content with an expired If-Modified-Since header:

# curl -z "Wed, 09 Jun 2010 16:07:50 GMT" --head "www.quotaproject.org/robots.txt"
HTTP/1.1 200 OK
Server: Apache
Last-Modified: Tue, 22 Jan 2013 13:23:41 GMT
Vary: Accept-Encoding
Cache-Control: public
Content-Type: text/plain; charset=UTF-8
Date: Mon, 25 Nov 2013 15:00:45 GMT
Age: 69236
Connection: keep-alive
X-Cache: HIT

304 response when If-Modified-Since is after Last-Modified date:

# curl -z "Wed, 09 Jun 2013 16:07:50 GMT" --head "www.quotaproject.org/robots.txt"
HTTP/1.1 304 Not Modified
Server: Apache
Last-Modified: Tue, 22 Jan 2013 13:23:41 GMT
Vary: Accept-Encoding
Cache-Control: public
Content-Type: text/plain; charset=UTF-8
Date: Mon, 25 Nov 2013 15:00:52 GMT
Age: 69243
Connection: keep-alive
X-Cache: HIT

The same with the example you gave in varnishlog output:

# curl -z "Wed, 15 Feb 2012 07:25:00 CET" --head "www.quotaproject.org/fr/CS/CS_AU-Maboreke-6-6-2004.pdf"
HTTP/1.1 304 Not Modified
Server: Apache
Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT
Cache-Control: public
Content-Type: application/pdf
Accept-Ranges: bytes
Date: Mon, 25 Nov 2013 15:08:48 GMT
Age: 335802
Connection: keep-alive
X-Cache: HIT

I would say Varnish works as expected. Maybe this was a problem with the Varnish build you were using or there was something amiss with the testing methodology. I couldn't see any problems with your VCL either.

like image 43
Ketola Avatar answered Oct 03 '22 22:10

Ketola