Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stripping out select querystring attribute/value pairs so varnish will not vary cache by them

My goal is to "whitelist" certain querystring attributes and their values so varnish will not vary cache between the urls.

Example:

Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello  
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello  
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye

In the above example I want to whitelist "utm_code" but not "type" So after the first url is hit I want varnish to serve that cached content to the second url.

However, in the case of the third url, the attribute "type" value is different so that should be a varnish cache miss.

I have tried the 2 methods below (found on a drupal help article I can't locate right now) that did not seem to work. Might be because I have the regex wrong.

# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_(campaign|content|medium|source|term)=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");

# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([\?|&])utm_campaign=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])foo_bar=[^&\s]*&?", "\1");
set req.url = regsuball(req.url, "([\?|&])bar_baz=[^&\s]*&?", "\1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[\?|&]+$", "");
like image 652
runamok Avatar asked Oct 30 '12 17:10

runamok


3 Answers

I figured this out and wanted to share. I found this code that makes a subroutine that does what I need.

sub vcl_recv {

    # strip out certain querystring params that varnish should not vary cache by
    call normalize_req_url;

    # snip a bunch of other code
}

sub normalize_req_url {

    # Strip out Google Analytics campaign variables. They are only needed
    # by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if(req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
    }
    set req.url = regsub(req.url, "(\?&?)$", "");
}
like image 67
runamok Avatar answered Nov 16 '22 09:11

runamok


There's something wrong with the RegEx.
I changed the RegExes used in both regsub calls:

sub normalize_req_url {
    # Clean up root URL
    if (req.url ~ "^/(?:\?.*)?$") {
        set req.url = "/";
    }

    # Strip out Google Analytics campaign variables
    # They are only needed by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%\._A-z0-9-]+&?", "");
    }
    set req.url = regsub(req.url, "(\?&|\?|&)$", "");
}

The first change is the part "[%._A-z0-9-]", because the dash functioned like a range symbol, that's why I've moved it to the end, and the dot should be escaped.

The second change is to not only remove a question mark at the remaining URL, but also an ampersand or question mark and ampersand.

like image 36
kipusoep Avatar answered Nov 16 '22 09:11

kipusoep


From https://github.com/mattiasgeniar/varnish-4.0-configuration-templates:

# Some generic URL manipulation, useful for all templates that follow
# First remove the Google Analytics added parameters, useless for our backend
if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
  set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
  set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
  set req.url = regsub(req.url, "\?&", "?");
  set req.url = regsub(req.url, "\?$", "");
}
like image 1
Pere Avatar answered Nov 16 '22 08:11

Pere