Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to find files matching file extension except if filename contains string

I have caching enabled for specific files in nginx, like this:

location ~* \.(?:css|js)$ {
access_log off;
add_header Cache-Control "no-transform,public,max-age=31536000,s-max-age=31536000";
expires 1y;
}   

What I'd like to do here is to exclude all files matching the pattern i18n-*.js, and as a result, cache all .js files except for the ones starting with i18n.

I tried to do a negative lookup to exclude the pattern, but it doesn't work as excepted because of the non-capturing group:

location ~* \.(?!i18n-.*\.js)(?:css|js)$ {
        access_log off;
        add_header Cache-Control "no-transform,public,max-age=31536000,s-max-age=31536000";
        expires 1y;
}

What's the smart solution here? I'm no regex expert, so a brief explanation would be helpful, too.

like image 718
mavericko Avatar asked Mar 02 '17 17:03

mavericko


2 Answers

Official documentation describes how location tree is traversed:

Rregular expressions are checked, in the order of their appearance in the configuration file. The search of regular expressions terminates on the first match, and the corresponding configuration is used. If no match with a regular expression is found then the configuration of the prefix location remembered earlier is used.

Based on that the configuration will be as follows:

location ~* \.(i18n-.*\.js)$ {
  access_log off;
  expires off;
}

location ~* \.(css|js)$ {
  access_log off;
  expires 1y;
  add_header Cache-Control public;
}  

Notes: Question mark in the regex is redundant unless used as a variable docs:

A named regular expression capture can be used later as a variable:

server {
  server_name   ~^(www\.)?(?<domain>.+)$;

  location / {
    root   /sites/$domain;
  }
}

In case of using ?: syntax to skip capturing groups they need to be used later, otherwise you can remove to simplify the location syntax.

like image 135
Anatoly Avatar answered Oct 18 '22 15:10

Anatoly


I am sure that Anatoly's answer is a complete solution to your problem. I just wanted to offer more insight than a comment would allow.

Good job on your regular expression. A pretty-well put together question and your expression came pretty close.

Here's why it didn't work

.               # matches any character except newline
(?!i18n-.*\.js) # A negative lookahead which actually does what you intended it to do
(?:css|js)$`    # extension list
  1. In each of your matches, the . was coincidentally matching the literal period here. Without an anchor or assertion it was permitted to start at this. (demo). With no quantifier, all attempts will produce incorrect results.
  2. There's no quantifier after the first period, so in any case, it would not properly get your full filename. Lookaheads evaluate without consuming.
    1. a(?=1) will match a against a1 but will not match against a2.
    2. a(?=1)c will fail against a1c.
    3. a(?=1)1c or a(?=1)\dc or a(?=1).c, etc would match a1c against a1c.
  3. Your .* need to be after the lookahead in this case. since the lookahead looks beyond what's captured up til this point.
    1. Pausing here and looking at a second demo might give you some insight into what it's doing.
    2. As you can see, it realizes at the first character of the first line that the match would fail so it moves on to the next character.
  4. There's no assertion (such as ^) or reference point character, (for example \/), and this is what happens in that situation. Adding such would make your expression work.
    1. A very similar thing happens here, it realizes that the first character won't match so begins the search again. It knows that the search requires that, in our example, it start at the beginning of the line, so it starts looking after the next newline.

It is worth noting, entirely for future reference, that if you wanted to use a reference point character, like \/, you would use an expression like this \/(?!i18n-[^\/]*\.js)[^\/]*(?:css|js)$, otherwise a path that contained slashes could give unexpected results.

You had all the elements but, as you said I'm no regex expert, so a brief explanation would be helpful, too..

like image 42
Regular Jo Avatar answered Oct 18 '22 14:10

Regular Jo