What is the most efficient way to define a location directive which matches something like
location = /[0-9a-zA-Z_-]{1,6} { content_by_lua_file ....}
In other words a URI which matches a string from 1 to 6 characters with "-", "_", digits and letters.
Or is it faster to check string length within my LUA code, which will generate the output by using a location directive like
location / {content_by_lua_file...}
NGINX uses Perl Compatible Regular Expressions (PCRE), and this post assumes a basic understanding of both NGINX and regular expressions. Explaining how to construct regexes is outside the scope of this post, and we regret that we cannot answer further questions in the comments section about how to do so.
To find a location match for an URI, NGINX first scans the locations that is defined using the prefix strings (without regular expression). Thereafter, the location with regular expressions are checked in order of their declaration in the configuration file.
Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.
Regular expression matching can be simple and fast, using finite automata-based techniques that have been known for decades. In contrast, Perl, PCRE, Python, Ruby, Java, and many other languages have regular expression implementations based on recursive backtracking that are simple but can be excruciatingly slow.
Regular expressions are very efficient at what they do.
When the task is trivial (for instance checking the presence of a particular string), a string function can be faster than a regex—depending on the platform. Here, you are checking both for a character range and a length. It's unlikely that Lua code (compiled at run time) will be faster than the pre-compiled C code of the PCRE regex library used by nginx.
In general, the regex for a string from 1 to 6 characters with "-", "_", digits and letters
can be written as
^[-\w]{1,6}$
That is because
^
anchor asserts that we are at the beginning of the string\w
word character matches letters, digits and the underscore character$
anchor asserts that we are at the end of the stringHowever, in nginx, the ~
(request starts with) operator allows us to drop the beginning anchor ^
. You would write something like this:
location ~ [-\w]{1,6}$ {
# some rewrite code, for example
# rewrite ^([^/]+)/?$ /oldsite/$1 break;
}
One more morsel of information for the curious: in Lua itself, the above regex could be turned into a Lua pattern, where %
is used in place of \
to form metacharacters:
^[-%w]{1,6}$
Reference
I think that in Lua you will have to check not only length, but also the content of string.
Nginx uses the C library PCRE for regular expressions.
There is also PCRE-JIT which JIT compiles regular expression, particularly useful if the regular expression is more complex than the one in your question.
I think in Nginx it's faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With