Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient regular expression for Nginx location

What is the most efficient way to define a location directive which matches something like

location = /[0-9a-zA-Z_-]{1,6} { content_by_lua_file ....}

In other words a URI which matches a string from 1 to 6 characters with "-", "_", digits and letters.

Or is it faster to check string length within my LUA code, which will generate the output by using a location directive like

location  / {content_by_lua_file...}
like image 311
user1606908 Avatar asked Oct 23 '13 20:10

user1606908


People also ask

What regex does NGINX use?

NGINX uses Perl Compatible Regular Expressions (PCRE), and this post assumes a basic understanding of both NGINX and regular expressions. Explaining how to construct regexes is outside the scope of this post, and we regret that we cannot answer further questions in the comments section about how to do so.

How does NGINX match location?

To find a location match for an URI, NGINX first scans the locations that is defined using the prefix strings (without regular expression). Thereafter, the location with regular expressions are checked in order of their declaration in the configuration file.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

Is regex match fast?

Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.


2 Answers

Regular expressions are very efficient at what they do.

When the task is trivial (for instance checking the presence of a particular string), a string function can be faster than a regex—depending on the platform. Here, you are checking both for a character range and a length. It's unlikely that Lua code (compiled at run time) will be faster than the pre-compiled C code of the PCRE regex library used by nginx.

In general, the regex for a string from 1 to 6 characters with "-", "_", digits and letters can be written as

^[-\w]{1,6}$

That is because

  • The ^ anchor asserts that we are at the beginning of the string
  • The \w word character matches letters, digits and the underscore character
  • The $ anchor asserts that we are at the end of the string

However, in nginx, the ~ (request starts with) operator allows us to drop the beginning anchor ^. You would write something like this:

location ~ [-\w]{1,6}$ {
    # some rewrite code, for example
    # rewrite ^([^/]+)/?$ /oldsite/$1 break;
}

One more morsel of information for the curious: in Lua itself, the above regex could be turned into a Lua pattern, where % is used in place of \ to form metacharacters:

^[-%w]{1,6}$

Reference

  • ngx_http_rewrite_module
  • Lua Patterns
like image 192
zx81 Avatar answered Oct 19 '22 10:10

zx81


I think that in Lua you will have to check not only length, but also the content of string.
Nginx uses the C library PCRE for regular expressions.
There is also PCRE-JIT which JIT compiles regular expression, particularly useful if the regular expression is more complex than the one in your question. I think in Nginx it's faster.

like image 41
Rhim Avatar answered Oct 19 '22 10:10

Rhim